GenCore version 5.1.7 
Copyright (c) 1993 - 2006 Biocceleration Ltd. 



OM protein - protein search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table : 



Searched : 



May 5, 2006, 12:45:45 ; Search time 96 Seconds 

(without alignments) 
2274.701 Million cell updates/sec 

US-10-804-785-2 
2744 

1 QSACTLQSETHPPLTWQKCS TVCASGTTCQVLNPYYSQCL 497 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 

2443163 seqs, 439378781 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post -processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



2443163 



Database 



A_Geneseq_21 : * 

1: geneseqpl980s : * 

2: geneseqpl990s : * 

3: geneseqp2000s : * 

4: geneseqp2001s : * 

5: geneseqp2002s : * 

6 : geneseqp2003as : * 

7 : geneseqp2003bs : * 

8: geneseqp2004s : * 

9: geneseqp2005s : * 



GenCore version 5.1.7 
Copyright (c) 1993 - 2006 Biocceleration Ltd. 



OM protein - protein search, using sw model 

Run on: May 5, 2006, 12:46:45 ; Search time 21 Seconds 

(without alignments) 
1956.658 Million cell updates/sec 



Title: 
Perf ec 
Sequence : 

Scoring table 



US-10-804-785-2 
Perfect score: 2744 

1 QSACTLQSETHPPLTWQKCS TVCASGTTCQVLNPYYSQCL 497 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched : 



572060 seqs, 82675679 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



572060 



Post-processing: Minimum Match 0% 

Maximum Match 100% 



N Listing first 45 summaries 

Database : Issued_Patents_AA: * 

1: /cgn2_6/ptodata/l/iaa/5_COMB.pep:* 

2 : /cgn2_6/ptodata/l/iaa/6_COMB.pep:* 

3 : /cgn2_6/ptodata/ l/iaa/H_COMB . pep : * 

4 : /cgn2_6/ptodata/l/iaa/PCTUS_COMB.pep:* 

5 : /cgn2_6/ptodata/l/iaa/RE_COMB.pep: * 

6 : /cgn2_6/ptodata/l/iaa/backf ilesl .pep : * 

GenCore version 5.1.7 
Copyright (c) 1993 - 2006 Biocceleration Ltd. 



OM protein - protein search, using sw model 
Run on: May 5, 2006, 12:46:45 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



Searched : 



Search time 21 Seconds 
(without alignments) 
1956.658 Million cell updates/sec 



US-10-804-785-2 

2744 

1 QSACTLQSETHPPLTWQKCS TVCASGTTCQVLNPYYSQCL 4 97 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



572060 seqs, 82675679 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



572060 



Database 



Issued_Patents_AA: * 

1 : /cgn2_6 /ptodata/ 1/ iaa/ 5_C0MB , pep : * 

2 : /cgn2_6/ptodata/l/iaa/6_COMB.pep: * 

3 : /cgn2_6/ptodata/l/iaa/H_COMB.pep:* 

4 : /cgn2_6/ptodata/l/iaa/PCTUS_COMB.pep: * 

5 : /cgn2_6/ptodata/l/iaa/RE_COMB.pep: * 

6 : /cgn2_6/ptodata/l/iaa/backf ilesl.pep: * 



GenCore version 5.1.7 
Copyright (c) 1993 - 2006 Biocceleration Ltd. 



OM protein - protein search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



May 5, 2006, 12:51:20 ; Search time 15 Seconds 

(without alignments) 
1533.568 Million cell updates/sec 

US-10-804-785-2 
2744 

1 QSACTLQSETHPPLTWQKCS TVCASGTTCQVLNPYYSQCL 4 97 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched : 



235405 seqs, 46284737 residues 



Total number of hits satisfying chosen parameters: 



235405 



Minimum seq length: 0 

Maximum DB seq length: 2000000000 

Post -processing : Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : Published_Applications_AA_New: * 

1 : /SIDS5/ptodata/l/pubpaa/US08_NEW_PUB .pepl : * 

2 : /SIDS5/ptodata/l/pubpaa/US06_NEW__PUB.pep: * 

3 : /SIDS5/ptodata/l/pubpaa/US07_NEW_PUB . pep : * 

4 : /SIDS5/ptodata/l/pubpaa/US08_NEW_PUB.pep:* 

5 : /SIDS5/ptodata/l/pubpaa/PCT_NEW_PUB.pep:* 

6 : /SIDS5/ptodata/l/pubpaa/US09_NEW_PUB . pep : * 

7 : /SIDS5/ptodata/l/pubpaa/US09_NEW_PUB .pepl : * 

8 : /SIDS5/ptodata/l/pubpaa/US10_NEW_PUB.pep:* 

9 : /SIDS5/ptodata/l/pubpaa/US10_NEW_PUB .pepl : * 

10 : /SIDS5/ptodata/l/pubpaa/USll_NEW_PUB.pep: * 

11 : /SIDS5/ptodata/l/pubpaa/USll_NEW_PUB .pepl : * 

12 : /SIDS5/ptodata/l/pubpaa/US60_NEW_PUB.pep:* 

GenCore version 5.1.7 
Copyright (c) 1993 - 2006 Biocceleration Ltd. 



OM protein - protein search, using sw model 



Run on: May 5, 2006, 12:46:35 ; Search time 41 Seconds 

(without alignments) 
1166.335 Million cell updates/sec 

Title : US -10 -804 -785-2 

Perfect score: 2744 

Sequence: 1 QSACTLQSETHPPLTWQKCS TVCASGTTCQVLNPYYSQCL 4 97 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0 . 5 

Searched: 283416 seqs, 96216763 residues 



Total number of hits satisfying chosen parameters: 283416 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post -processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : PIR 80:* 



1: 


pirl : * 


2: 


pir2 : * 


3: 


pir3 : * 


4 : 


pir4 : * 



GenCore version 5.1.7 
Copyright (c) 1993 - 2006 Biocceleration Ltd. 



OM protein - protein search, using sw model 



Run on: May 5, 2006, 12:45:59 ; Search time 230 Seconds 

(without alignments) 
1524.555 Million cell updates/sec 



Title : US- 10-804 -785-2 

Perfect score: 2744 



Sequence^: 1 QSACTLQSETHPPLTWQKCS TVCASGTTCQVLNPYYSQCL 497 

Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 2166443 seqs, 705528306 residues 

Total number of hits satisfying chosen parameters: 2166443 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : UniProt_05 . 80 : * 

1 : uniprot_sprot : * 
2 : uniprot_trembl : * 



Title:. 



US-lO-804-785-2 



RESULT 8 

US-09-548-938A-10 

Sequence 10, Application US/09548938A 
Patent No, 6573086 
GENERAL INFORMATION: 
APPLICANT: EMALFARB, MARK AARON 
APPLICANT: BURLINGAME, RICHARD PAUL 
APPLICANT: OLSON, PHILIP TERRY 

APPLICANT: SINITSYN, ARKADY PANTELEIMONOVICK 
APPLICANT: PARRICHE, MARTINE 
APPLICANT: BOUSSON, JEAN CHRISTOPHE 
APPLICANT: PYNNONEN, CHRISTINE MARIE 
APPLICANT: PUNT, PETER JAN 

APPLICANT: VAN-ZEIJL, CORNELIA MARIA JOHANNA • - 

TITLE OF INVENTION: TRANSFORMATION SYSTEM IN THE FIELD OF FILAMENTOUS FUNGI 
FILE REFERENCE: 3123-4001 

CURRENT APPLICATION NUMBER: US/09/548 , 938A 
CURRENT FILING DATE: 2000-04-13 
NUMBER OF SEQ ID NOS : 19 
SOFTWARE: Patent In Ver. 2.1 
SEQ ID NO 10 
LENGTH: 526 
TYPE: PRT 

ORGANISM: Chrysosporium lucknowense 
FEATURE : 

NAME/KEY: MOD_RES 
LOCATION: (249) 

OTHER INFORMATION: Variable amino acid 
FEATURE : 

NAME/ KEY: MOD_RES 
LOCATION: (365) 

OTHER INFORMATION: Variable amino acid 
US-09-548-938A-10 

Query Match 61.8%; Score 1695; DB 2; Length 52 6; 

Best Local Similarity 60.4%; Pred. No. 4e-124; 

Matches 311; Conservative 68; Mismatches 112; Indels 24; Gaps 10; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

hllll :| M III Ihllhll Ih lllllllll hhlllhll I 
Db 18 QNACTLTAENHPSLTWSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSY 77 

Qy 61 CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

I I Ul Ihlll hlllhllllllh: llh hhl III III II 

Db 7 8 CSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQM 137 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I Mlllhllllll I llllllllllllllllhlll I lllllllllllllllll 

Db 138 FQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRD 197 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 23 9 

Illllhllll h hhll I I :|llllllhlllh:: I Mill HI II 
Db 198 LKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCXVIGQSRCE 257 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 299 

IMIIIII :|| I ||||ll|:| II II :|ll I hllllhllllll : I 
Db 258 GDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITWTQFLKNSA 315 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

I hlllll : : II : hi ::| II : lllh I 

Db 316 GELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQDWCDRQKAAFGDVTDXQDKGGMVQMG 375 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II U II I I Ihl I h II I III Ihl : : III Ihl hllllhlh^HI 

Db 376 KALAGPMVLVMSIWDDHAVNMLWLDSTWPI-DGAGKPGAERGACPTTSGVPAEVEAEAPN 434 



Qy 


414 


Db 


435 


Qy 


463 


Db 


492 



I MlhlllllM III III III :::| | I || 



QSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 4 97 

II MIII|::|M I I M II :|IIM 
AKHYEQCGG IGFTGPTQCES PYTCTKLNDWYSQCL 526 



Title: US-10-804 -785-2 

RESULT 10 
US-08-676-166A-3 

; Sequence 3, Application US/08676166A 

; Patent No, 5955270 

; GENERAL INFORMATION: 

APPLICANT: Radford, Alan 

APPLICANT: Parish, John H. 

TITLE OF INVENTION: EXPLOITATION OF THE CELLULASE COMPLEX OF 
TITLE OF INVENTION: NEUROSPORA 
NUMBER OF SEQUENCES: 7 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: David A. Jackson, Esq. 

STREET: 411 Hackensack Ave, Continental Plaza, 4th 

STREET: Floor 

CITY: Hackensack 

STATE: New Jersey 

COUNTRY: USA 

ZIP: 07601 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentin Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/ 676 , 166A 

FILING DATE: 15-JUL-1996 

CLASSIFICATION: 4 35 
ATTORNEY/ AGENT INFORMATION: 

NAME: Jackson Esq., David A. 

REGISTRATION NUMBER: 26,742 

REFERENCE/DOCKET NUMBER: 1321-1-002 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: 201-487-5800 

TELEFAX: 201-343-1684 
; INFORMATION FOR SEQ ID NO : 3: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 525 amino acids 

TYPE: amino acid 

STRANDEDNESS : single 

TOPOLOGY: linear 
MOLECULE TYPE: protein 
HYPOTHETICAL: NO 
ORIGINAL SOURCE: 

ORGANISM: H. grisea 
US-08-676-166A-3 

Query Match 60.4%; Score 1658; DB 1; Length 525; 

Best Local Similarity 57.5%; Pred. No. 3.1e-121; 

Matches 295; Conservative 77; Mismatches 119; Indels 22; Gaps 7 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl :| M hi:||::|l I h :|:|||lll : Mill II I ::: 

Db 19 QQACSLTTERHPSLSWKKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 

Qy 61 CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 



I I :H|:|lhlll I ||||:||:|:M|: III: II hi III : II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSIiKFVTKGQYSTNVGSRTYLMDGEDKYQT 138 

Qy 12 0 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhlllMI : IIIIIIIIIIIIIIMh|:|l I llllllllllhlllll 

Db 13 9 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

Ulll|:||:||| hh I I I :|:|IIIIIMIIh:: I mill :| I II 
Db 199 IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 24 0 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

II lllllh II I IMIIIhl II II :|ll I hlMlhllllll 

Db 259 GDSCGGTYSNERYAGVCDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITWTQFLKDAN 316 

Qy 298 ---GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

I I hllhl : : II : hi II h llh I 

Db 317 GDLGEIKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II :| lllllhllh :||IIIMhl : : III Ihl hllllhlh::|| 
Db 377 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

Qy 414 AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 

^ I lllhllMIII I :||||| II llh Mill :| II 
Db 4 36 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 4 92 

Qy 465 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

: MIMh:|M I II II :||||| 

Db 493 RWQQCGGIGFTGPTQCEEPYTCTKLNDWYSQCL 525 



Title: US- 10- 804 -785-2 

RESULT 15 
US-10-481-179-2 

Sequence 2, Application US/10481179 
Publication No. US20040197890A1 
GENERAL INFORMATION: 
APPLICANT: Novozymes A/S 
APPLICANT: Lange, Lene 
APPLICANT: Wu, Wenping 
APPLICANT: Aubert, Dominique 
APPLICANT: Landvik, Sara 
APPLICANT: Schnorr, Kirk 
APPLICANT: Clausen, lb 

TITLE OF INVENTION: Polypeptides having cellobiohydrolase I activity and 
TITLE OF INVENTION: polynucleotides encoding same 
FILE REFERENCE: 10129. 204 -WO 
CURRENT APPLICATION NUMBER: US/lO/481,179 
CURRENT FILING DATE: 2003-12-17 
NUMBER OF SEQ ID NOS : 67 
SOFTWARE: Patentin version 3.2 
SEQ ID NO 2 
LENGTH: 526 
TYPE: PRT 

ORGANISM: Acremonium thermophilum 
US-10-481-179-2 

Query Match 65.6%; Score 1799; DB 4; Length 526; 

Best Local Similarity 64.5%; Pred. No. 2.8e-128; 

Matches 330; Conservative 57; Mismatches 107; Indels 18; Gaps 7; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I INI U II hi Ihllhll :||| lllllllll :|||||| II I 
Db 18 QQACTLTAENHPTLSWSKCTSGGSCTSVSGSVTIDANWRWTHQVSSSTNCYTGNEWDTSI 77 



Qy . 61 CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

I I :|l mill h llhllllhlh MM |:|:| MMIM || 

Db 78 CTDGASCAAACCLDGADYSGTYGITTSGNALSLQFVTQGPYSTNIGSRTYLMASDTKYQM 137 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

MMMIhlMlh I IIIIIIIIIMII llhMI I lllllllllllllllll 

Db 13 8 FTLLGNEFTFDVDVTGLGCGLNGALYFVSMDEDGGLSKYSGNKAGAKYGTGYCDSQCPRD 197 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

MMIhM M IMh I hi MMIIMhMIIIII I llllllhM M 

Db 198 LKFINGEANNVGWTPSSNDKNAGLGNYGSCCSEMDVWEANSISAAYTPHPCTTIGQTRCE 257 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 299 

M IMIII Ml I IIIIIIIM IIMIIMII I IMIMI IIIIM I : 

Db 258 GDDCGGTYSTDRYAGECDPDGCDFNSYRMGNTTFYGKG--MTVDTSKKFTWTQFLTDSS 315 

Qy 3 00 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSS-FSDKGGLTQFK 353 

I IMIIIM h : III : M h: M I MM I 

Db 316 GNLSEIKRFYVQNGWIPNSNSNIAGVSGNSITQAFCDAQKTAFGDTNVFDQKGGLAQMG 375 

Qy 3 54 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II = lllllllllh llllllllllll : III Ihl hlllll lllhll 

Db .376 KALAQPMVLVMSLWDDHAVNMLWLDSTYPTN-AAGKPGAARGTCPTTSGVPADVESQAPN 434 

Qy 414 AKVTFSNIKFGPIGST--GNPSGGNPPGGNPPGTTTTRRPATTTGSSP GPTQSH 465 

Ml MIIMMIIM I I l|: Ml MM IIIIM II MM 

Db 435 SKVIYSNIRFGPIGSTVSGLPGGGSNPGGGSSSTTTTTRPATSTTSSASSGPTGGGTAAH 494 

Qy 4 66 YGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 4 97 

MlllllhMIIIIII III II M III 

Db 4 95 WGQCGGIGWTGPTVCASPYTCQKLNDWYYQCL 526 



Title : US -10 -804 -785-2 



RESULT 4 
S38794 

cellulose 1 , 4 -beta-cellobiosidase (EC 3.2.1.91) - imperfect fungus (Humicola grisea) 
N; Alternate names: beta-glucancellobiohydrolase; exoglucanase 
C; Species: Humicola grisea var. thermoidea 

C;Date: lO-Sep-1999 #sequence_revision lO-Sep-1999 #text_change 09-Jul-2004 
C;Accession: S38794; S08240; A45869 
R; Radford, A. 

submitted to the EMBL Data Library, June 1991 
A; Reference number: S38794 

A; Accession : S3 87 94 
A; Molecule type: DNA 
A;Residues: 1-525 <RAD> 

A;Cross-references : UNIPROT : P15828 ; UNI PARC : UPIO 000 12BE0F; EMBL:X17258; NID:g2760; 

PIDN:CAA35159.1; PID:g2761 

A;Note: this is a revision to the sequence from reference S08240 
R;de Olivi era ■Az.gv edo, M. ; Radford, A. 
ftucleic Acids Res. 18, 668, 1990 

A; Title: Sequence of cbh-1 gene of Humicola grisea var. thermoidea. 
A/Reference number: S08240; MUID : 90175006 ; PMID:2308855 
A/Accession: S08240 
A; Molecule type: DNA 
A;Residues: 1-299, 'H' , 301-525 <DEO> 

A; Cross-references: UNIPARC:UPI00001729F6; EMBL:X17258 
A; Note: the authors translated the codon CAG for residue 87 as His 
A; Note: this sequence has been revised in reference S38794 
^R.Azevedo, M.; de. O.^- Felipe, M.S.S.; Astolf i-Filho, S.; Radford, A. 
J. Gen. Microbiol. 136, 2569-2576, 1990 
A;Title: Cloning, sequencing and homologies of the cbh-1 (exoglucanase) gene of Humicola gri 
var. thermoidea. 

A;Reference number: A45869; MUID : 91178527 ; PMID:2127803 
A/Accession: A45869 




A; status: not compared with conceptual translation 
A /Molecule type: DNA 

A; Residues: 1-20, 'R' ,22-34, 'K' ,36-86, 'H' ,88-141, 'V ,143-157, 'Y' ,159-237, 'QQH' ,241-244, 'I' ,246- 
299, 'H' ,301-525 <AZE> 

A; Cross-references : UNIPARC :UPI00001729F7 ; GB:M64588; GB:X17258 
A; Note: this sequence has been revised. See entry S08240 

C;Genetics : 
A; Gene: cbh-1 
A;Introns: 138/1 

C;Superfamily : cellulose 1, 4-beta-cellobiosidase I; fungal cellulose-binding domain homology 

C; Keywords: glycosidase; hydrolase; polysaccharide degradation 
F; 494-525/Domain: fungal cellulose-binding domain homology <FCB> 

Query Match 60.2%; Score 1652; DB 1; Length 525; 

Best Local Similarity 57.3%; Pred. No. 1.8e-91; 

Matches 294; Conservative 76; Mismatches 121; Indels 22; Gaps 7; 



Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl :| II hi lh:|l I h =|:|lllll : III!! 11 I : : : 

Db 19 QQACSLTTERHPSLSWNKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 

Qy 61 CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQK-NVGARLYLMASDTTYQE 119 

I I ::|hllhlll I lllhlhhllh III: llhl Ml : II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQHSTNVGSRTYLMDGEDKYQT 138 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhllllll : IMIIIIIIIMIMIhhll I MINI INI hi Ml I 

Db 13 9 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

MMIhlhlll hh I I I MMIIIIMIIM::: I MMM Ml II 

Db 199 IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 24 0 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

II lllllh II I llllllhl II II :||| I hllllhllllll 

Db 259 GDSCGGTYSNERyAGVCDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITVVTQFLKDAN 316 

Qy 298 GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

I I IMIIM : : II : IM - II h Mh I 

Db 317 GDLGEIKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II M MIIIIMII: MIIIIIIIM : : III MM IMMIIM|::MI 
Db 377 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

Qy 414 AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 

■■ I lllhlllllll I :||||l II III: Mill :| II 
Db 4 36 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 492 

Qy 465 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 4 97 

: IIMIhMM I I II MMM 

Db 493 RWQQCGGIGFTGPTQCEEPYICTKLNDWYSQCL 525 

Title : US -10 -804 -785-2 



RESULT 6 
S42093 

cellulose 1, 4-beta-cellobiosidase (EC 3.2.1.91) - Neurospora crassa 
C; Species: Neurospora crassa 

C;Date: 20-May-1994 #sequence_revision lO-Nov-1995 #text_change 09-Jul-2004 
C; Accession : S4 2 0 93 
R;Taleb, F.; Radford, A. 

submitted to the EMBL Data Library, February 1994 

A/Description: Cloning sequencing and homologies of the CBH-l (exocellobiohydrolase) gene of 

Neurospora crassa. 

A; Reference number: S42093 



A; Accession: S42093 
A; Molecule type: DNA 
A;ResidueS: 1-516 <TAL> 

A; Cross -references : UNIPROT : P38676 ; UNIPARC :UPI000011D714 ; EMBL:X77778; NID : g456657 ; 
PIDN : CAA54815 . 1 ; PID:g456658 
C;Genetics : 
A;Introns: 227/3 

C;Superfamily : cellulose 1 , 4 -beta-cellobiosidase I; fungal cellulose-binding domain homology 
C; Keywords: glycosidase; hydrolase; polysaccharide degradation 
F;485-516/Domain: fungal cellulose-binding domain homology <FCB> 

Query Match 56.9%; Score 1561; DB 2; Length 516; 

Best Local Similarity 57.5%; Pred. No. 4.7e-86; 

Matches 294; Conservative 62; Mismatches 129; Indels 26; Gaps 10; 



Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I I M MINI: II I I ::|:|||llllll|: II M II I HI 

Db 18 QQAGTLTAKRHPSLTWQKCTRGGCPTLNT-TMVLDANWRWTHATSGSTKCYTGNKWQATL 76 

Qy 61 CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEF 120 

III ::|l II MM I Mhl II Ih: Ml Mill MM M II 

Db 77 CPDGKSCAANCALDGADYTGTYGITGSGWSLTLQFVTD NVGARAYLMADDTQYQML 132 

Qy 121 TLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDL 180 

M I IMhl MIIIIIMI Mlllll: IIIM Mill llllhllllM 

Db 13 3 ELLNQELWFDVDMSNIPCGLNGALYLSAMDADGGMRKYPTNKAGAKYATGYCDAQCPRDL 192 

Qy 181 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEG 240 

IMII lIMM Ihhll Ml IIIIIIIIIMIII M I IIIMIh I MM 

Db 193 KYINGIANVEGWTPSTNDAN-GIGDHGSCCSEMDIWEANKVSTAFTPHPCTTIEQHMCEG 251 

Qy 241 DGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA- 299 

I llllllhlll II lllhl Ihllhlll I |:||: I mill I 
Db 252 DSCGGTYSDDRYGVLCDADGCDFNSYRMGNTTFYGEGK--TVDTSSKFTWTQFIKDSAG 309 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKK 354 

I =11111 : : : III = =1 : = = II h MM I | 

Db 310 DLAEIKAFYVQNGKVIENSQSNVDGVSGNSITQSFCKSQKTAFGDIDDFNKKGGLKQMGK 369 

Qy 355 ATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNA 414 

I ■■ IIMIhllh lllllllllll III III |:|||lh|:= MM 

Db 370 ALAQAMVLVMSIWDDHAANMLWLDSTYP VPKVPGAYRGSGPTTSGVPAEVDANAPNS 426 

Qy 415 KVTFSNIKFGPI GSTGNPSGGNPPGGNPPGTTTTRRPATTTGSSP-GPTQSHY 466 

II lllllll : Ihl I II I ::| : :|:| hi I :|: 

Db 427 KVAFSNIKFGHLGISPFSGGSSGTPP-SNPSSSASPTSSTAKPSSTSTASNPSGTGAAHW 485 

Qy 467 GQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

lllllhlMI I M : lllh 

Db 486 AQCGGIGFSGPTTCPEPYTCAKDHDIYSQCV 516 

Title : US -10 -8 04 -785 -2 



RESULT 13 / 
JE0313 / 
exoglucanase (EC 3.2.-,-) - imperfect fungus (Humicola grisea) 
C; Species: Humicola grisea 

C;Date: 05-Feb-1999 #sequence_revision 05-Feb-1999 #text_change 09-Jul-2004 
C; Accession : JE0313 

^^^iJTaJ^asliim^ S.; likura, H. ; Nakamura, A.; Hidaka, M.; Masaki, H.; Uozumi, T. 
J. Biochem. 124, 717-725, 1998 

A;Title: Isolation of the gene and characterization of the enzymatic properties of a major 
exoglucanase of Humicola grisea without a cellulose-binding domain. 
A;Reference number: JE0313; MUID : 98429588 ; PMID:9756616 
A; Accession: JE0313 
A; Status : preliminary 



A; Molecule type: DNA 
A; Residues: 1-451 <TAK> 

A;Cross-ref erences : UNIPROT:O93780; UNIPARC : UPI000005E865 ; DDBJ : AB003 105 

C; Superf amily : cellulose 1, 4-beta-cellobiosidase I; fungal cellulose-binding domain homology 
C ; Keywords : glycosidase ; hydrolase 

Query Match 45.2%; Score 1241.5; DB 2; Length 451; 

Best Local Similarity 52,0%; Pred. No. 4.8e-67; 

5; Conservative 84; Mismatches 114; Indels 11; Gaps 9; 

QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I I h :| II :|h:|l I I I lllllllll I h llhll hi 

QQAGTITAENHPRMTWKRCSGPGNCQTVQGEWIDANWRWLH- -NNGQNCYEGNKWTSQ- 7 9 

CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQ-KNVGARLYLMASDTTYQE 119 

I Ih I III! I INI :|lhl|:: llh hhl lllh II 

CSSATDCAQRCALDGANYQSTYGASTSGDSLTLKFVTKHEYGTNIGSRFYLMANQNKYQM 139 

FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

III-' II I :| II hh: I hi llllhl: llh: I hi I I I I I I I I I I h I I II 
FTLMNNEFAFDVDLSKVECG INSALYFVAMEEDGGMAS YPSNRAGAKYGTGYCDAQCARD 199 

LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQ-EIC 238 

III! hl|:|ll Ihh I hi hlhhhihh : III! I : : II 
LKFIGGKANIEGWRPSTNDPNAGVGPMGACCAEIDVWESNAYAYAFTPHACGSKNRYHIC 259 

EGDGCGGT YS DNR YGGTCD PDGCDWNP YRLGNTS F YGPGS S FTLDTTKKLTWTQFETSG 298 
I : llllllhh I II :|||:||l|:|l III I hll U llhUI : 



Ihl : I I : :: : ||: | • : |:: || 



SGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKV 4 16 

: lllllhllh::|lllllhll I : II III hllllhlhl hhl 

TI PMVLVMS I WDDHHSNMLWLDSSYPP -EKAGLPGGDRGPCPTTSGVPAEVEAQYPDAQV 435 

TFSNIKFGPIGSTGN 431 

Hlhlllllll I 
VWSNIRFGPIGSTVN 450 



Matches 


22i 


Qy 


1 


Db 


23 


Qy 


51 


Db 


80 


Qy 


120 


Db 


140 


Qy 


180 


Db 


200 


Qy 


239 


Db 


260 


Qy 


299 


Db 


317 


Qy 


357 


Db 


377 


Qy 


417 


Db 


436 



Title: 



US-10-804-785-2 



1 



RESULT 8 
Q8WZJ4_PENFN 

ID Q8WZJ4_PENFN PRELIMINARY; PRT; 529 AA. 

AC Q8WZ J4 ; 

DT Ol-MAR-2002 (TrEMBLrel . 20, Created) 

DT Ol-MAR-2002 (TrEMBLrel. 20, Last sequence update) 

DT Ol-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Xylanase/cellobiohydrolase precursor (EC 3.2.1.91). 

GN Name=xynA; 

OS Penicillium funiculosum. 

OC Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes; 

OC Eurotiales; Trichocomaceae ; mitosporic Trichocomaceae; Penicillium. 

OX NCBI__TaxID=28572 ; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RX MEDLINE=22548831; PubMed=12664 153 ; 

RA Alcocer M., Furniss C, Kroon P. A., Campbell M., Archer D.B.; 

RT "Comparison of modular and non-modular xylanases as carrier proteins 

RT for the efficient secretion of heterologous proteins from Penicillium 

RT funiculosum. " ; 

RL Appl. Microbiol. Biotechnol . 60:726-732(2003). 

RN [2] 




RP ^NUCLEOTIDE SEQUENCE. 

*RA Furniss C.S.M., Williamson G., Kroon P.A.; 

RT "The substrate specificity and susceptibility to wheat inhibitor 

RT proteins of Penicillium funiculosum xylanases from a commercial enzyme 

RT preparation. " ; 

RL J. Sci. Food Agric. 85:574-582(2005). 

CC -!- CATALYTIC ACTIVITY: Hydrolysis of 1 , 4 -beta-D-glucosidic linkages 

CC in cellulose and cellotetraose, releasing cellobxose from the non- 

CC reducing ends of the chains. 

DR EMBL; AJ312295; CAC85737.1; -; Genomic_DNA. 

DR HSSP; Q09431; IGPI. 

DR GO; GO: 0005576; C : extracellular region; lEA. 

DR GO; GO: 0016162; F: cellulose 1 , 4 -beta-cellobiosidase activity; lEA. 

DR GO; GO:0030248; F:cellulose binding; lEA. 

DR GO; GO: 0005975; P : carbohydrate metabolism; lEA. 

DR GO; GO: 0030245; P: cellulose catabolism; lEA. 

DR GO; GO: 0000272; P : polysaccharide catabolism; lEA. 

DR GO; GO: 0045493; P:xylan catabolism; lEA. 

DR InterPro; IPR000254; CBD_f un . 

DR InterPro; IPR001722; Glyco_hydro_7 . 

DR Pfam; PF00734; CBM_1 ; 1. 

DR Pfam; PF0 084 0; Glyco_hydro_7 ; 1. 

DR PRINTS; PRO 07 34; GLHYDRLASE7 . 

DR ProDom; PD001821; CBD_f ungal ; 1. 

DR ProDom; PD186135; Glyco_hydro_7 ; 1. 

DR SMART; SM00236; fCBD; 1. 

DR PROSITE; PS00562; CBD_FUNGAL; 1. 

KW Carbohydrate metabolism; Cellulose degradation; Glycosidase; 

KW Hydrolase; Polysaccharide degradation; Signal; Xylan degradation. 

FT SIGNAL 1 24 Potential. 

FT CHAIN 25 529 xylanase/cellobiohydrolase , 

SQ SEQUENCE 529 AA; 55048 MW; 95232F53577B6416 CRC64 ; 

Query Match 63.1%; Score 1730.5; DB 2; Length 529; 

Best Local Similarity 63.1%; Pred. No. 2.9e-103; 

Matches 323; Conservative 55; Mismatches 111; Indels 23; Gaps 9; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I I -'WW hi I llhll :h: :|||||| I IHIIII lllh: : 
Db 26 QQIGTYTAETHPSLSWSTCKSGGSCTTNSGAITLDANWRWVHGVNTSTNCYTGNTWNTAI 85 

Qy 61 CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEF 120 

I Hh:| MM h llhlMIIII : III I llhl MM M II I 
Db 86 CDTDASCAQDCALDGADYSGTYGITTSGNSLRLNFVTGS NVGSRTYLMADNTHYQIF 142 

Qy 121 TLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDL 180 

M 11 = 1 INI lllllllllllhlllllllllll I llhll lllllllllll 

Db 143 DLLNQEFTFTVDVSNLPCGLNGALYFVTMDADGGVSKYPNNKAGAQYGVGYCDSQCPRDL 202 

Qy 181 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEG 240 

III I II II I II Ihlhlllll llllhhillllllllllllllll I I :| 

Db 203 KFIAGQANVEGWTPSTNNSNTGIGNHGSCCAELDIWEANSISEALTPHPCDTPGLTVCTA 262 
Qy 241 DGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFET 296 

I iiiiii III iiiiiiiihiiiiii I III I hiiii mill I 

Db 263 DDCGGTYSSNRYAGTCDPDGCDFNPYRLGVTDFYGSGK- - TVDTTKPFTWTQFVTDDGT 320 

Qy 297 -SGA INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFG-GSSFSDKGGLTQ 351 

Ih I llllllll Ih::: III =1 hi II : II Uh: III 
Db 321 SSGSLSEIRRYYVQNGWIPQPSSKISGISGNVINSDFCAAELSAFGETASFTNHGGLKN 380 

Qy 352 FKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQS 411 

I llllllllllll llllllllll III MM MM hll I Mill 

Db 3 81 MGSALEAGMVLVMSLWDDYSVNMLWLDSTYPANET-GTPGAARGSCPTTSGNPKTVESQS 439 

Qy 412 PNAKVTFSNIKFGPIGSTGNPSGGNPPGGN PPGTTTTRRPATTTGSSPGPT- -QSH 465 

I Ihll II II III Ih Mhh hi h I :| 



Db . 440 GSSYWFSDIKVGPFNSTF--SGGTSTGGSTTTTASGTTSTKASTTSTSSTSTGTGVAAH 497 

Qy 466 YGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

h:|ll lllllll hllllllll 
Db 4 98 WGQCGGQGWTGPTTCASGTTCTWNPYYSQCL 529 

Title: US -10 -804 -785-2 



RESULT 14 
Q12 621_HUMGT 



ID Q12621_HUMGT PRELIMINARY; . PRT; 525 AA. 

AC Q12621; 

DT Ol-NOV-1996 (TrEMBLrel. 01, Created) 

DT Ol-NOV-1996 (TrEMBLrel. 01, Last sequence update) 

DT Ol-MAR-2004 (TrEMBLrel. 26, Last annotation update) 

DE Cellulase (EC 3 . 2 . 1 . 91) . 

GN Name=cbh-1; 

OS Humicola grisea var. thermoidea. 

OC Eukaryota; Fungi; Ascomycota; mitosporic Ascomycota; Humicola. 

OX NCBI_TaxID=:5528; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=IF09854; 

RA Takashima S., Nakamura A., Hidaka M. , Masaki H., Uozumi T.; 

RT "Cloning, sequencing, and expression of the cellulase genes of 

RT Humicola grisea var. thermoidea."; 

RL Submitted (JUL-1995) to the EMBL/GenBank/DDBJ databases. 

CC -!- FUNCTION: The biological conversion of cellulose to glucose 

CC generally requires three types of hydrolytic enzymes: (1) 

CC Endoglucanases which cut internal beta-1, 4-glucosidic bonds; (2) 

CC Exocellobiohydrolases that cut the dissaccharide cellobiose from 

CC the nonreducing end of the cellulose polymer chain; (3) Beta-1, 4- 

CC glucosidases which hydrolyze the cellobiose and other short cello- 

CC oligosaccharides to glucose (By similarity) . 

CC -I- CATALYTIC ACTIVITY: Hydrolysis of 1 , 4 -beta-D-glucosidic linkages 
CC in cellulose and cellotetraose, releasing cellobiose from the non- 

ce reducing ends of the chains. 

DR EMBL; D63515; BAA09785.1; -; Genpmic_DNA. 

DR HSSP; Q09431; IGPI . 

DR GO; GO: 0005576; C : extracellular region; lEA. 

DR GO; GO: 0016162; F: cellulose 1 , 4 -beta-cellobiosidase activity; lEA. 

DR GO; 00:0030248; F:cellulose binding; lEA. 

DR GO; GO: 0005975; P : carbohydrate metabolism; lEA. 

DR GO; GO: 0030245; P: cellulose catabolism; lEA. 

DR GO; GO: 0000272; P : polysaccharide catabolism; lEA. 

DR InterPro; IPR000254; CBD_f un . 

DR InterPro; IPR001722; Glyco__hydro_7 . 

DR Pfam; PF00734; CBM_1 ; 1. 

DR Pfam; PF00840; Glyco_hydro_7 ; 1. 

DR PRINTS; PR00734; GLHYDRLASE7 . 

DR ProDom; PD001821; CBD_f ungal ; 1. 

DR ProDom; PD186135; Glyco_hydro_7 ; 1. 

DR SMART; SM00236; fCBD; 1. 

DR PROSITE; PS00562; CBD_FUNGAL; 1. 

KW Carbohydrate metabolism; Cellulose degradation; Glycosidase; 

KW Hydrolase; Polysaccharide degradation. 

SQ SEQUENCE 525 AA; 55722 MW; A2E6E5F40F6P3BB0 CRC64 ; 



Query Match 60.4%; Score 1658; DB 2; Length 525; 

Best Local Similarity 57.5%; Pred. No. l,3e-98; 

Matches 2 95; Conservative 77; Mismatches 119; Indels 22; Gaps 7; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl ^1 II h|:|h:|| I h :hllllll : Mill 1| | 

Db 19 QQACSLTTERHPSLSWKKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 



Qy . 61 CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

I nHhIlhlll I lllhlhhill: III: I I hi III : II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQYSTNVGSRTYLMDGEDKYQT 138 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhllllll : IIIIIIIIIIMIIMhhIl I llllllllllhlllll 

Db 139 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

OlllhlhMI hh I I I :|:|||IMIIIIh:: I llllll Ul M 
Db 199 IKFINGE7\NIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

M lllllh II I IIIIMhl II II :||l I hllllhllllll 
Db 259 GDSCGGTYSNERYAGVCDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITWTQFLKDAN 316 

Qy 298 ---GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

I II --I I hi : : II : hi II h llh I 

Db 317 GDLGEIKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 

Oy 3 54 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II U IIIM|:|lh Hllllllhl : : III Ihl hllllhlh::!! 

Db 377 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

Qy 414 AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 

^ I lllhlllllll I :||||l II llh Mill :| II 
Db 436 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 492 

Qy 4 65 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

: lllllhHIl I II II :||IM 

Db 493 RWQQCGGIGFTGPTQCEEPYTCTKLNDWYSQCL 525 

Title: US- 10 -804 -785 -2 

RESULT 15 
GUX1_HUMGT 

ID GUXl^HUMGT STANDARD; PRT; 525 AA. 

AC P15828; 

DT Ol-APR-1990 (Rel. 14, Created) 

DT Ol-FEB-1996 (Rel. 33, Last sequence update) 

DT lO-MAY-2005 (Rel. 47, Last annotation update) 

DE Exoglucanase I precursor (EC 3.2.1.91) (Exocellobiohydrolase I) (1,4- 

DE beta-cellobiohydrolase) (Beta-glucancellobiohydrolase) . 

GN Name=CBH-l; 

OS Humicola grisea var. thermoidea . 

OC Eukaryota; Fungi; Ascomycota; mitosporic Ascomycota; Humicola. 

OX NCBI_TaxID=552 8; 

RN [1] 

RP l^IUCLEOTIDE SEQUENCE. 

RX MEDLINE=90175006; PubMed=2308855 ; 

RA de Oliviera Alzevedo M., Radford A.; 

RT "Sequence of cbh-1 gene of Humicola grisea var. thermoidea."; 

RL Nucleic Acids Res. 18:668-668(1990). 

CC FUNCTION: The biological conversion of cellulose to glucose 

CC generally requires three types of hydrolytic enzymes: (1) 

CC Endoglucanases which cut internal beta- 1 , 4 -glucosidic bonds; (2) 

CC Exocellobiohydrolases that cut the dissaccharide cellobiose from 

CC the nonreducing end of the cellulose polymer chain; (3) Beta- 1, 4- 

CC glucosidases which hydrolyze the cellobiose and other short cello- 

CC oligosaccharides to glucose. 

CC -!- CATALYTIC ACTIVITY: Hydrolysis of 1,4-beta-D-glucosidic linkages 
CC in cellulose and cellotetraose, releasing cellobiose from the non- 

CC reducing ends of the chains. 

CC SIMILARITY: Belongs to the glycosyl hydrolase 7 (cellulase C) 

CC family. 

CC SIMILARITY: Contains 1 CBMl (fungal-type carbohydrate -binding) 



CC . domain. 

CC ^ - 

CC This Swiss-Prot entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use as long as its content is in no way modified and this statement is not 

CC removed . 

CC 

DR EMBL; X17258; CAA35159.1; -; Genomic_DNA. 

DR PIR; S38794; S38794 . 

DR HSSP; Q09431; IGPI . 

DR InterPro; IPR000254; CBD^fun. 

DR InterPro; IPR001722; Glyco_hydro_7 . 

DR Pfam; PF00734; CBM_1; 1. 

DR Pfam; PF0084 0; Glyco_hydro_7 ; 1. 

DR PRINTS; PR00734; GLHYDRLASE7 . 

DR ProDom; PD001821; CBD_fungal; 1. 

DR ProDom; PD186135; Glyco_hydro_7 ; 1. 

DR SMART; SM00236; fCBD; 1. 

DR PROSITE; PS00562; CBD_FUNGAL; 1. 

KW Carbohydrate metabolism; Cellulose degradation; Glycoprotein; 

KW Glycosidase; Hydrolase; Polysaccharide degradation; Signal. 



FT 


SIGNAL 


1 


18 


Potential . 


FT 


CHAIN 


19 


525 


Exoglucanase I . 


FT 


DOMAIN 


490 


525 


CBMl. 


FT 


REGION 


19 


467 


Catalytic . 


FT 


REGION 


468 


489 


Linker . 


FT 


ACT_SITE 


231 


231 


Nucleophile (By similarity) . 


FT 


ACT^SITE 


236 


236 


Proton donor (By similarity) . 


FT 


CARBOHYD 


289 


289 


N-linked (GlcNAc. . .) (Potential) 


FT 


DISULFID 


497 


514 


By similarity. 


FT 


DISULFID 


508 


524 


By similarity. 


SQ 


SEQUENCE 


525 AA; 


55694 


MW; A6684D4CF881E090 CRC64 ; 



Query Match 60.2%; Score 1652; DB 1; Length 525; 

Best Local Similarity 57.3%; Pred. No. 3.3e-98; 

Matches 294; Conservative 76; Mismatches 121; Indels 22; Gaps 7; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl :| II hi lh:|| I h :hllllll : Mill II I ::: 

Db 19 QQACSLTTERHPSLSWNKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 

Qy 61 CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQK-NVGARLYLMASDTTYQE 119 

I nUh-llhlll I lllhlhhilh llh I I hi III : II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQHSTNVGSRTYLMDGEDKYQT 13 8 



Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I II Nihil Mil : llllllllllllllllhhil I IIIIIIMIIhlllll 

Db 13 9 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

MMIhlhlM hh I I I MMIMIIMIIh:: I IIIMI HI M 

Db 199 IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 



Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

II lllllh II I llllllhl M II :||| I hllMhIIIIII 

Db 259 GDSCGGTYSNERYAGVCDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITWTQFLKDAN 316 

Qy 298 ---GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

I I hllhl : : II : hi - M |: Mh I 

Db 317 GDLGEIKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II =1 lllllhlll: :||||lllhl : = III Ihl hllllhlh::|| 

Db 377 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 



Qy 



414 AKVTFSNIKFGPIGST - -GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 



i 



I lllhlllllll I :MIII II III: Hill :| || 



Db 436 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 492 

Qy 4 65 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

IllllhUII I I II :||IM 

Db 4 93 RWQQCGGIGFTGPTQCEEPYICTKLNDWYSQCL 525 



TrtleT US-10-804-785-2 



RESULT 132 
ABJ26885 

ID ABJ26885 standard; protein; 526 AA. 

XX 

AC ABJ26885; 
XX 

DT 08-MAY-2003 (first entry) 
XX 

DE Cellobiohydrolase I activity protein SEQ ID No 2 . 
XX 

KW Cellobiohydrolase; enzyme; DNA shuffling; ethanol; biomass; 

KW cellobiohydrolase I; EC 3.2.1.91. 

XX 

OS Acremonium thermophilum. 
XX 

PN WO2003000941-A2 . 
XX 

PD 03-JAN-2003. 
XX 

PF 26-JXJN-2002; 2002WO-DK00042 9 . 
XX 

PR 26-JUN-2001; 2001DK-00001000 . 

XX 

PA (NOVO ) NOVOZYMES AS. 
XX 

PI Lange L, Wu W, Aubert D, Landvik S, Schnorr KM, Clausen IG; 

XX 

DR WPI; 2003-278244/27. 

DR N-PSDB; ABT23503 . 

XX 

PT New polypeptide with cellobiohydrolase I activity, useful in producing 

PT ethanol from biomass . 

XX 

PS Claim 4; Page 111-113; 199pp; English. 
XX 

CC The invention relates to a novel polypeptide comprising: part of any of 

CC 21 amino acid sequences; an amino acid sequence at least 70% identical to 

CC a polypeptide encoded by a cellobiohydrolase gene; an amino acid sequence 

CC at least 80% identical to the polypeptide encoded by 21 nucleotide 

CC sequences; a polypeptide encoded by a nucleotide sequence which 

CC hybridises with a probe selected from complementary strands of 55 

CC nucleotide sequences; or a fragment of the aforementioned structures. The 

CC polynucleotides of the invention are useful in a method of DNA shuffling. 

CC The polypeptides are useful in a method for producing ethanol from 

CC biomass comprising contacting the biomass with the polypeptides. This 

CC sequence represents a protein with cellobiohydrolase I activity of the 

CC invention 

XX 

SQ Sequence 526 AA; 

Query Match 65.7%; Score 17 99; DB 6; Length 526; 

Best Local Similarity 64.5%; Pred. No. 4e-108; 

Matches 330; Conservative 57; Mismatches 107; Indels 18; Gaps 7 
Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I MM M M hi MMMMI MM lllllllll MMMI II I 

Db 18 QQACTLTAENHPTLSWSKCTSGGSCTSVSGSVTIDANWRWTHQVSSSTNCYTGNEWDTSI 77 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

I I Ml IIIMI h IIIMIIIIMI: MM IMM MMMI II 

Db 78 CTDGASCAAACCLDGADYSGTYGITTSGNALSLQFVTQGPYSTNIGSRTYLMASDTKYQM 137 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

lllllllhllllh I llillllllllll llhlll I lllllllllllllllll 



bb 138 FTLLGNEFTFDVDVTGLGCGLNGALYFVSMDEDGGLSKYSGNKAGAKYGTGYCDSQCPRD 197 



Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

Illllhll II lllh I hi :||lllllhllllll| i llllllhll II 

Db 198 LKFINGEANNVGWTPSSNDKNAGLGNYGSCCSEMDVWEANSISAAYTPHPCTTIGQTRCE 257 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 2 99 

II llllll :|l I llllllhl Ihllhlll I hlhll IIMII I : 

Db 258 GDDCGGTYSTDRYAGECDPDGCDFNSYRMGNTTFYGKG--MTVDTSKKFTVVTQFLTDSS 315 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSS-FSDKGGLTQFK 353 

I hllllll h : III = :| h: II I MM I 

Db 316 GNLSEIKRFYVQNGWIPNSNSNIAGVSGNSITQAFCDAQKTAFGDTNVFDQKGGLAQMG 375 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II ^ Mlllllllh IIIIIIMMII : III Ihl hlllll lllhll 

Db 376 KALAQPMVLVMSLWDDHAVNMLWLDSTYPTN-AAGKPGAARGTCPTTSGVPADVESQAPN 434 

Qy 414 AKVTFSNIKFGPIGST--GNPSGGNPPGGNPPGTTTTRRPATTTGSSP GPTQSH 465 

Ml Ml hill MM I I Ih III Mil MUM II MM 

Db 435 SKVIYSNIRFGPIGSTVSGLPGGGSNPGGGSSSTTTTTRPATSTTSSASSGPTGGGTAAH 494 

Qy 466 YGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

MlllllhMIIIIII III II M III 

Db 495 WGQCGGIGWTGPTVCASPYTCQKLNDWYYQCL 526 

RESULT 134 
AAB81926 

ID AAB81926 Standard; protein; 529 AA. 
XX 

AC AAB81926; 

XX 

DT 25-JUN-2001 (first entry) 
XX 

DE Acremonium cellulolyticus cellobiohydrolase 1 precursor. 
XX 

KW Cellobiohydrolase 1; cbhl; promoter; protein production. 

XX 

OS Acremonium cellulolyticus. 
XX 

FH Key Location/Qualifiers 

FT Peptide 1. .26 

FT /label= signal^peptide 

FT Protein 27. .529 

FT /label= mature_cellobiohydrolase 
XX 

PN JP2001017180-A. 

XX 

PD 23-JAN-2001. 
XX 

PF 06-JUL-1999; 99JP-00191221 . 

XX 

PR 06-JUL-1999; 99JP-00191221 . 
XX 

PA (MEIJ ) MEIJI SEIKA KAISHA LTD. 

PA (AGEN ) AGENCY OF IND SCI & TECHNOLOGY. 

XX 

DR WPI; 2001-294133/31. 

DR N-PSDB; AAF85588. 

XX 

PT New promotor useful for expression of a protein. 

XX 

PS Disclosure; Page 12-14; 22pp; Japanese. 
XX 

CC The present invention provides a promoter capable of causing the 

CC expression of a gene connected dovmstream. It can be used for expressing 



CCf a protein in a large amount. The present sequence is the Acremonium 

CC cellulolyticus cellobiohydrolase 1 precursor (cbhl) protein 

XX 

SQ Sequence 529 AA; 

Query Match 63.6%; Score 1741.5; DB 4; Length 529; 

Best Local Similarity 62.9%; Pred. No. 2.1e-104; 

Matches 324; Conservative 52; Mismatches 110; Indels 29; Gaps 8; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I I :|IM hi I llhll :h: Hlllll I hlllll lllhl : 
Db 26 QQIGTYTAETHPSLSWSTCKSGGSCTTNSGAITLDANWRWVHGVNTSTNCYTGNTWNSAI 85 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEF 120 

I : :|h:| MM h llhllMIII : Ml I llhl MM M II I 
Db 86 CDTDASCAQDCALDGADYSGTYGITTSGNSLRLNFVTGS NVGSRTYLMADNTHYQIF 142 

Qy 121 TLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDL 180 

M Ihi MM lllllllllllhllllllllMI I llhll IMMMMM 

Db 143 DLLNQEFTFTVDVSHLPCGLNGALYFVTMDADGGVSKYPNNKAGAQYGVGYCDSQCPRDL 202 

Qy 181 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEG 240 

III llllllll lllllllllil IhlhhIIIIIIIIIIIIIIIII I I :| 
Db 203 KFIAGQANVEGWTPSSNNANTGIGNHGACCAELDIWEANSISEALTPHPCDTPGLSVCTT 262 

Qy 241 DGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS 297 

I MMI! Ml llllllllhlllMI I III I hllll llllll h 

Db 263 DACGGTYSSDRYAGTCDPDGCDFNPYRLGVTDFYGSGK--TVDTTKPFTWTQFVTNDGT 320 

Qy 298 GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQ 351 

I llllllll Ih::: III M III II : Ml MM MM 
Db 321 STGSLSEIRRYYVQNGWIPQPSSKISGISGNVINSDYCAAEISTFGGTASFSKHGGLTN 380 

Qy 352 FKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQS 411 

MIIIIIIIIM IIIIIIIIIIM I MM Ihhhil I Ihll 

Db 381 MAAGMEAGMVLVMSLWDDYAVNMLWLDSTYPTNAT-GTPGAARGTCATTSGDPKTVEAQS 439 

Qy 412 PNAKVTFSNIKFGPIGSTGNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 

lllhh II II llh Ih MM III :| I I 

Db 440 GSSYVTFSDIRVGPFNSTF--SGGSSTGGS TTTTASRTTTTSASSTSTSSTSTGTGV 494 

Qy 4 65 --HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

hlllll hMII I Mill hllllllll 

Db 495 AGHWGQCGGQGWTGPTTCVSGTTCTWNPYYSQCL 52 9 

RESULT 137 
AAB47783 

ID AAB47783 Standard; protein; 526 AA. 
XX 

AC AAB47783; 
XX 

DT 13-MAR-2002 (first entry) 
XX 

DE Chrysoporium CBHl. 
XX 

KW Glycosyl hydrolase; family 7; family 10; CBHl; Xyll; fermentation; 
KW promoter; terminator; glycer aldehyde phosphate dehyrogenase; GPDl. 
XX 

OS Chrysosporium sp. 
XX 

FH Key Location/Qualifiers 

FT Peptide 1. .19 

FT /labels signal_peptide 

FT Protein 20. .526 

FT /labels mature_protein 

FT Misc-dif f erence 249 



FT /note= "Encoded by ACC" 

FT Misc-dif ference 365 

FT /note= "Encoded by TTN" 

FT Binding-site 496. .526 

FT /labels Cellulose_binding_domain 

XX 

PN WO200179507-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 17-APR-2001; 2001WO-NL000301 . 
XX 

PR 13-APR-2000; 2000EP-00201343 . 
XX 

PA (EMAL/) EMALFARB M A. 
XX 

PI Emalfarb MA, Punt PJ, Van Zeijl CMJ; 
XX 

DR WPI; 2002-066369/09. 
DR N-PSDB; AAI72045. 
XX 

PT New glycosyl hydrolase family 7, glycosyl hydrolase family 10 and 

PT glyceraldehyde phosphate dehydrogenase genes from the filamentous fungus 

PT Chrysosporium useful for the microbial production of these proteins. 

XX 

PS Claim 1; Page 34; 43pp; English, 
XX 

CC This sequence shows a Chrysosporium glycosyl hydrolase family 7 protein, 

CC CBHl. The CBHl nucleic acid is used for the industrial production of CBHl 

CC protein by microbial fermentation. The CBHl regulatory sequences 

CC (promoter and terminator) are useful for expressing heterologous 

CC polypeptides in microbes 

XX 

SQ Sequence 526 AA; 

Query Match 61.6%; Score 1689; DB 5; Length 526; 

Best Local Similarity 60.4%; Pred. No. 5.3e-101; 

Matches 311; Conservative 68; Mismatches 112; Indels 24; Gaps 10; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

hllll :| II III Ihllhll Ih lllllllll hhlllhll I :: 
Db 18 QNACTLTAENHPSLTWSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSY 77 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

I I HI Ihlll hllihillllll:: III: hhl III IN II 

Db 78 CSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQM 137 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhllllll I llllllllllllllllhlll I lllllllllllllllll 

Db 138 FQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRD 197 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

lllllhllll 1= I I :||||||lhlll|::: I Mill :|| II 

Db 198 LKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCWVIGQSRCE 257 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 299 

II mill :|l I llllllhl II II :||| I hllllhllllll ■■ I 

Db 258 GDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITVVTQFLKNSA 315 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSF-SDKGGLTQFK 353 

I hlllll : : II : hi ::| II : lll|: I 

Db 316 GELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQDWCDRQKAAFGDVTDXQDKGGMVQMG 375 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II =1 lllllhllh lllllllhl : : III Ihl |:|||||:||:::|| 
Db 376 KALAGPMVLVMSIWDDHAVNMLWLDSTWPI-DGAGKPGAERGACPTTSGVPAEVEAEAPN 434 



Qy 



414 




Db 



435 



Qy 



463 



QSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 



Db 



492 




RESULT 138 
ABJ26886 

ID ABJ26886 Standard; protein; 529 AA. 
XX 

AC ABJ26886; 
XX 

DT 08-MAy-2003 (first entry) 
XX 

DE Cellobiohydrolase I activity protein SEQ ID No 4 . 

XX 

KW Cellobiohydrolase; enzyme; DNA shuffling; ethanol; biomass; 

KW cellobiohydrolase I; EC 3.2.1.91. 

XX 

OS Chaetomium thermophilum. 
XX 

PN WO2003000941-A2. 
XX 

PD 03-JAN-2003. 
XX 

PF 26-JUN-2002; 2002WO-DK000429 . 
XX 

PR 26-JUN'2001; 2001DK- 00001000 . 
XX 

PA (NOVO ) NOVOZYMES AS. 

XX 

PI Lange L, Wu W, Aubert D, Landvik S, Schnorr KM, Clausen IG; 
XX 

DR WPI; 2003-278244/27. 

DR N-PSDB; ABT23504 . 
XX 

PT New polypeptide with cellobiohydrolase I activity, useful in producing 

PT ethanol from biomass. 

XX 

PS Claim 4; Page 115-117; 199pp; English. 
XX 

CC The invention relates to a novel polypeptide comprising: part of any of 

CC 21 amino acid sequences; an amino acid sequence at least 70% identical to 

CC a polypeptide encoded by a cellobiohydrolase gene; an amino acid sequence 

CC at least 80% identical to the polypeptide encoded by 21 nucleotide 

CC sequences; a polypeptide encoded by a nucleotide sequence which 

CC hybridises with a probe selected from complementary strands of 55 

CC nucleotide sequences; or a fragment of the aforementioned structures. The 

CC polynucleotides of the invention are useful in a method of DNA shuffling. 

CC The polypeptides are useful in a method for producing ethanol from 

CC biomass comprising contacting the biomass with the polypeptides. This 

CC sequence represents a protein with cellobiohydrolase I activity of the 

CC invention 

XX 

SQ Sequence 529 AA; 

Query Match 61.6%; Score 1689; DB 6; Length 529; 

Best Local Similarity 59.3%; Pred. No. 5.3e-101; 

Matches 305; Conservative 76; Mismatches 113; Indels 20; Gaps 7; 
Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 



Db 




Qy " 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQ-KNVGARLYLMASDTTYQE 119 

I I ::|h Ihill hlllhlllhll:: llh Ilhhill HI || 

Db 79 CSDGKSCAQTCCVDGADYSSTYGITTSGDSLNLKFVTKHQYGTNVGSRVYLMENDTKYQM 138 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhllllll I IIIIIIIIIMIIIIIhlll I IIIIIIIIMhIIIM 

Db 13 9 FELLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDAQCPRD 198 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

Illllhlhl I Ihhil I I :||||||llllllh:: I llllll :M II 
Db 199 LKFINGEANIENWTPSTNDANAGFGRYGSCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 299 

h llllll II I llllllhl II h :|ll I hMllhllllll : I 
Db 2 59 GNSCGGTYSSERYAGVCDPDGCDFNAYRQGDKTFYGKG--MTVDTTKKMTVVTQFHKNSA 316 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNEIiNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

I hllhl || : -I I- || h llh I 

Db 317 GVLSEIKRFYVQDGKVIANAESKIPGNPGNSITQEWCDAQKVAFGDIDDFNRKGGMAQMS 376 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II I IMIIhllhlllllMIIIII : MM Ihl hllllhMM II 

Db 377 KALEGPMVLVMSVWDDHYANMLWLDSTYPIDK-AGTPGAERGACPTTSGVPAEIEAQVPN 435 

Qy 414 AKVTFSNIKFGPIGSTGNPSGGNPPGG NPPGTTTTRRPATTTGSSP GPTQ 463 

I MlhlMMM h I II :|h I Ml hi I I 

Db 436 SNVIFSNIRFGPIGSTVPGLDGSTPSNPTATVAPPTSTTSVRSSTTQISTPTSQPGGCTT 495 

Qy 464 SHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

Mllllllhl I I MMI IIIMIIII 

Db 496 QKWGQCGGIGYTGCTNCVAGTTCTELNPWYSQCL 529 

RESULT 14 0 
ABB05058 

ID ABB05058 Standard; protein; 526 AA. 
XX 

AC ABB05058; 
XX 

DT ll-SEP-2003 (revised) 

DT 27-MAR-2002 (first entry) 

XX 

DE Trichoderma reesei cellobiohydrolase I (CBHl) 55kD (family 7) protein, 
XX 

KW Trichoderma reesei; filamentous fungi; phenotype; characterisation; 
KW fermentation; screening; morphology; cellobiohydrolase I; CBHl. 
XX 

OS Hypocrea jecorina. 
XX 

FH Key Location/Qualifiers 
FT Peptide 1. .19 

FT /label= signal 

FT Protein 20. .526 

FT /label= cellobiohydrolase_I 

FT Misc-dif ference 249 

FT /label= unknovm 

FT /note= "encoded by NCC" 

FT Misc-dif ference 365 

FT /labels unknown 

FT /note= "encoded by TTN" 

XX 

PN WO200125468-A1. 
XX 

PD 12-APR-2001. 
XX 

PF 13-APR-2000; 2000WO-US010199 . 



PR 06-OCT-1999; 99WO-NL000618 . 
XX 

PA (EMAL/) EMALFARB M A. 
XX 

PI Emalfarb MA; 
XX 

DR WPI; 2001-281733/29. 

DR N-PSDB; ABA92722. 
XX 

PT Expressing heterologous proteins encoded by a library of DNA vectors, 

PT involves stably transforming mutant filamentous fungus with the vectors 

PT and culturing transformed fungi for expressing heterologous proteins. 
XX 

PS Disclosure; Page 66-69; 85pp; English. 



CC The present invention describes a method of expressing a number of 

CC proteins encoded by a library of DNA vectors (I) . The method involves 

CC stably transforming a mutant filamentous fungus (11) with (I) so as to 

CC introduce into each of a number of individual fungi, at least one 

CC heterologous protein-encoding nucleic acid sequence (III) , and culturing 

CC the transformed mutant filamentous fungi for the expression of 

CC heterologous proteins encoded by (III) . (I) comprises a number of 

CC different vectors, each comprising a different protein-encoding nucleic 

CC acid sequence being operably linked to an expression regulating region 

CC and optionally a secretion signal encoding sequence. (II) has a phenotype 

CC characterised by a culture viscosity, when cultured in suspension, of 

CC less than 200 cP at the end of fermentation when grown with adequate 

CC nutrients under optimal or near-optimal conditions. The method is useful 

CC for expressing large quantities of heterologous proteins that are useful 

CC for isolation, characterisation and application testing, and also for 

CC commercial production of proteins. The mutant filamentous fungi obtained 

CC by the method are suitable for high -throughput screening techniques owing 



CC to their unique morphology and very low viscosity of their cultures. The 

CC present sequence represents the Trichoderma reesei cellobiohydrolase I 

CC (CBHl) 55kD (family 7) protein, which is given in the exemplification of 

CC the present invention. (Updated on ll-SEP-2003 to standardise OS field) 
XX 

SQ Sequence 526 AA; 



Query Match 61.4%; Score 1681; DB 4; Length 526; 

Best Local Similarity 60.4%; Pred. No. 1.7e-100; 

Matches 311; Conservative 68; Mismatches 112; Indels 24; Gaps 10 



Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

hllll U II III Ihllhll l|: lllllllll hhlllhll I :: 
Db 18 QNACTLTAENHPSLTWSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSY 77 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

I I HI Ihlll hlllhllllllh: llh hhl III III II 

Db 78 CSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQM 137 

Qy 12 0 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 17 9 

I lllllhllllll I llllllllllllllllhlll I lllllllllllllllll 

Db 138 FQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRD 197 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

illllhllll h hhll I I :||||||||:||||::: I Mill HI II 
Db 198 LKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCXVIGQSRCE 257 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 299 

II llllll :|l I llllllhl II II :||| I hllllhllllll : I 

Db 258 GDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITVVTQFLKNSA 315 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSF-SDKGGLTQFK 353 

I hlllll : : II : hi ::| II : ||||: I 

Db 316 GELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQDWCDRQKAAFGDVTDXQDKGGMVQMG 375 



Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II H lllllhlll: lllllllhl : : III Ihl hllllhlh::|| 
Db 376 KALAGPMVLVMSIWDDHAVNMLWLDSTWPI-DGAGKPGAERGACPTTSGVPAEVEAEAPN 434 

Qy 414 AKVTFSNIKFGPIGST--GNPSG--GNPPGGNPPGTTTTRRP--ATTTGSSPGPT 462 

: I lllhlllllll III III III :::| I Hh I III 
Db 435 SNVIFSNIRFGPIGSTVSGLPDGGSGNP NPPVSSSTPVPSSSTTSSGSSGPTGGTGV 491 

Qy 463 QSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

II lllllh:||| I I II II :|||ll 
Db .492 AKHYEQCGGIGFTGPTQCESPYTCTKLNDWYSQCL 526 



RESULT 141 
ABW00703 

ID ABW00703 standard; protein; 526 AA. 
XX 

AC ABW00703; 
XX 

DT 15-JAN-2004 (first entry) 

XX 

DE Chrysosporium lucknowense cellobiohydrolase (CBHl) protein. 
XX 

KW Mutant Chrysosporium strain; fungal enzyme; metabolite; organic acid; 
KW antibiotic; cellobiohydrolase; CBHl. 

XX 

OS Chrysosporium lucknowense. 
XX 

FH Key Location/Qualifiers 
FT Peptide 1. .20 

FT /label= Signal^peptide 

FT Protein 21. .526 

FT /note= "Mature CBHl protein" 

FT Misc-dif ference 13 7 

FT /note= "Encoded by AGTAAGTTCCTCTCGCACCCGGCCGCCGGGAGATGAT 

FT GGCGCCCAGCCCGCTGACGCGAATGACACAGTG " 

FT Misc-dif ference 249 

FT /note= "Encoded by ACC" 

FT Misc-dif ference 365 

FT /note= "Encoded by TTN" 

FT Domain 495. .526 

FT /note= "CBD domain" 

XX 

PN US6573086-B1. 

XX 

PD 03-JUN-2003. 
XX 

PF 13-APR-2000; 2000US-00548938 . 

XX 

PR 06-OCT-1998; 98WO-EP006496 . 
PR 06-OCT-1999; 99WO-NL000618 . 
XX 

PA (DYAD-) DYADIC INT INC. 
XX 

PI Emalfrab MA, Burlingame RP, Olson PT, Sinitsyn AP, Parriche M; 

PI Bousson JC, Pynnonen CM, Punt PJ, Van Zeijl CMJ; 

XX 

DR WPI; 2003-764575/72. 
DR N-PSDB; AAD61474 . 
XX 

PT New mutant Chrysosporium strain expressing a heterologous polypeptide, or 
PT overexpressing a homologous polypeptide, at a high level, useful for 
PT production of e.g. enzymes, primary metabolites, and antibiotics. 

XX 

PS Disclosure; Col 43-44; Opp; English. 
XX 



CC The invention relates to a mutant Chrysosporium strain comprising a 

CC nucleic acid encoding a polypeptide of interest, linked to an expression 

CC -regulating region chosen from promoter sequences associated with 

CC cellulase, xylanase or glyceraldehyde- 3 -phosphate dehydrogenase (gpdA) 

CC expression and optionally to a secretion signal sequence, where the 

CC mutant strain expresses the polypeptide at a higher level than a non- 

CC mutant strain under same conditions. The invention is useful for 

CC producing polypeptides such as carbohydrate -degrading enzymes, proteases, 

CC lipases, esterases, other hydrolases, oxidoreductases and transferases. 

CC The invention is also useful for producing fungal enzymes allowing 

CC production or overproduction of primary metabolites, organic acids, 

CC secondary metabolites or antibiotics. The present sequence is 

CC Chrysosporium lucknowense cellobiohydrolase (CBHl) protein 

XX 

SQ Sequence 526 AA; 



Query Match 61.4%; Score 1681; DB 7; Length 526; 

Best Local Similarity 60.4%; Pred. No. 1.7e-100; 

Matches 311; Conservative 68; Mismatches 112; Indels 24; Gaps 10; 



Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

hllll :| II III Ihllhll Ih lllllllll hhlllhll I 
Db 18 QNACTLTAENHPSLTWSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSY 77 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

I I HI Ihlll hlllhllllllh: III: hhl III III II 

Db 78 CSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQM 137 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhllllll I llllllllllllllllhlll I MIIIIIIIMIIIIII 

Db 13 8 FQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRD 197 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

Illllhllll h hhll I I Hllllllhlllh:: I Mill :|| || 
Db 198 LKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCXVIGQSRCE 257 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 299 

II mill :|l I llllllhl II II :MI I hllllhllllll : I 

Db 258 GDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITVVTQFLKNSA 315 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSF-SDKGGLTQFK 353 

I hlllll : : II : h'l ::| II : lllh I 

Db 316 GELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQDWCDRQKAAFGDVTDXQDKGGMVQMG 375 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II H lllllhllh IIMIIIhl : : III Ihl hllllhlh::|| 
Db 376 KALAGPMVLVMSIWDDHAVNMLWLDSTWPI-DGAGKPGAERGACPTTSGVPAEVEAEAPN 434 

Qy 414 AKVTFSNIKFGPIGST--GNPSG--GNPPGGNPPGTTTTRRP--ATTTGSSPGPT 462 

I lllhlllllM Ml III III :::| I Uh I III 

Db 435 SNVIFSNIRFGPIGSTVSGLPDGGSGNP NPPVSSSTPVPSSSTTSSGSSGPTGGTGV 491 

Qy 463 QSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 4 97 

II IMIIhHIl I I II II :||lll 

Db 492 AKHYEQCGGIGFTGPTQCESPYTCTKLNDWYSQCL 526 



RESULT 142 
ABJ26904 

ID ABJ26904 Standard; protein; 525 AA. 
XX 

AC ABJ26904; 
XX 

DT 08-MAY-2003 (first entry) 
XX 

DE Cellobiohydrolase I activity protein SEQ ID No 60. 
XX 



KW Cellobiohydrolase; enzyme; DNA shuffling; ethanol; biomass; 

KW cellobiohydrolase I; EC 3.2.1.91. 

XX 

OS Scytalidium thermophilum. 
XX 

PN WO2003000941-A2 . 
XX 

PD 03-JAN-2003. 
XX 

PF 26-JUN-2002; 2002WO-DK000429 . 
XX 

PR 26-JUN-2001; 2001DK-00001000 . 
XX 

PA (NOVO ) N0V02YMES AS. 
XX 

PI Lange L, Wu W, Aubert D, Landvik S, Schnorr KM, Clausen IG; 
XX 

DR WPI; 2003-278244/27. 

DR N-PSDB; ABT23542. 
XX 

PT New polypeptide with cellobiohydrolase I activity, useful in producing 

PT ethanol from biomass. 

XX 

PS Claim 4; Page 191-192; 199pp; English. 
XX 

CC The invention relates to a novel polypeptide comprising: part of any of 

CO 21 amino acid sequences; an amino acid sequence at least 70% identical to 

CC a polypeptide encoded by a cellobiohydrolase gene; an amino acid sequence 

CC at least 80% identical to the polypeptide encoded by 21 nucleotide 

CC sequences; a polypeptide encoded by a nucleotide sequence which 

CC hybridises with a probe selected from complementary strands of 55 

CC nucleotide sequences; or a fragment of the aforementioned structures. The 

CC polynucleotides of the invention are useful in a method of DNA shuffling. 

CC The polypeptides are useful in a method for producing ethanol from 

CC biomass comprising contacting the biomass with the polypeptides. This 

CC sec[uence represents a protein with cellobiohydrolase I activity of the 

CC invention 

XX 

SQ Sequence 525 AA; 

Query Match 60.8%; Score 1666; DB 6; Length 525; 

Best Local Similarity 57.5%; Pred. No. 1.6e-99; 

Matches 295; Conservative 78; Mismatches 118; Indels 22; Gaps 7; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl =1 II |:hlh:|| I \: UHIIIII = Mill II I 

Db 19 QQACSLTTERHPSLSWKKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQK-NVGARLYLMASDTTYQE 119 

I I ::|hll|:|ll I I I I I = I I = I = I I I : llh Mhl III ■■ II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQHSTNVGSRTYLMDGEDKYQT 138 

Qy 12 0 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhllllll : llllllllllllllll|:|:|| I llilllllllhlllll 

Db 139 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

Ulllhlhlll hh I I I :hllllllllll|::: I llllll HI il 
Db 199 IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

II IIMIh II I llllllhl II II :||| I hllllhllllll 

Db 259 GDSCGGTYSNERYAGVCDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITVVTQFLKDAN 316 

Qy 298 GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

I : hllhl : : II : hi II h 111: I 

Db 317 GDLGEVKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 



Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II =1 lllllhllh :|lilllihl : : Ml Ihl |:|||||:||:::|| 
Db 377 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

Qy 414 AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 

: I lllhlllllll I II III: Mill :| || 

Db 436 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 492 

Qy 465 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

h lllll|::||| I II II :||||| 

Db 4 93 HWQQCGGIGFTGPTQCEEPYTCTKLNDWYSQCL 525 



RESULT 9 

US-09-548-938A-10 

Sequence 10, Application US/09548938A 
Patent No. 6573086 
GENERAL INFORMATION: 
APPLICANT: EMALFARB, MARK AARON 
APPLICANT: BURLINGAME, RICHARD PAUL 
APPLICANT: OLSON, PHILIP TERRY 

APPLICANT: SINITSYN, ARKADY PANTELEIMONOVICK 
APPLICANT: PARRICHE, MARTINE 
APPLICANT: BOUSSON, JEAN CHRISTOPHE 
APPLICANT: PYNNONEN, CHRISTINE MARIE 
APPLICANT: PUNT, PETER JAN 

APPLICANT: VAN-ZEIJL, CORNELIA MARIA JOHANNA 

TITLE OF INVENTION: TRANSFORMATION SYSTEM IN THE FIELD OF FILAMENTOUS FUNGI 

FILE REFERENCE: 3123-4001 

CURRENT APPLICATION NUMBER: US/09/548 , 938A 
CURRENT FILING DATE: 2000-04-13 
NUMBER OF SEQ ID NOS : 19 
SOFTWARE: Patent In Ver. 2.1 
SEQ ID NO 10 
LENGTH: 526 
TYPE: PRT 

ORGANISM: Chrysosporium lucknowense 
FEATURE: 

NAME /KEY: MOD_RES 
LOCATION: (249) 

OTHER INFORMATION: Variable amino acid 
FEATURE : 

NAME/ KEY: MOD_RES 
LOCATION: (365) 

OTHER INFORMATION: Variable amino acid 
US-09-548-938A-10 

Query Match 61.4%; Score 1681; DB 2; Length 526; 

Best Local Similarity 60.4%; Pred. No. 2.4e-121; 

Matches 311; Conservative 68; Mismatches 112; Indels 24; Gaps 10; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

hllM :| II III Ihllhll ||: lllllllll hhllihil I 
Db 18 QNACTLTAENHPSLTWSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSY 77 

Qy 61 CPDNeIcAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

II HI Ihlll hlllhllllllh: III: |:|:| III 111 M 

Db 78 CSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQM 137 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhllllll I llllllllllllllllhlll I lllllllllllllilll 

Db 138 FQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRD 197 

^4. . c( / 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

IIINhllll h hhll I I :||||||||:||lh:: I Mill HI II 
Db 198 LKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCXVIGQSRCE 257 



Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 299 

II mill HI I llllllhl II II :||| I hllllhllllll : I 
Db 258 GDSCGGTYSTDRYAGICDPDGCDFNS YRQGNKTFYGKG --MTVDTTKKITWTQFLKNSA 315 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSF-SDKGGLTQFK 353 

I hlllll : : II : h| ::| || : ||lh I 

Db 316 GELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQDWCDRQKAAFGDVTDXQDKGGMVQMG 375 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II H lllllhllh lllllllhl : : III Ihl hllllhlh::|| 
Db 376 KALAGP^^\^^VMSIWDDHAVNMLWLDSTWPI-DGAGKPGAERGACPTTSGVPAEVEAEAPN 434 

Qy 414 AKVTFSNIKFGPIGST--GNPSG--GNPPGGNPPGTTTTRRP--ATTTGSSPGPT 462 

: I lllhlllllll III III III ■■■■■■\ I :|h I III 
Db 435 SNVIFSNIRFGPIGSTVSGLPDGGSGNP NPPVSSSTPVPSSSTTSSGSSGPTGGTGV 491 

Qy 463 QSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

II lllll|::|ll II II II :||||| 
Db 492 AKHYEQCGGIGFTGPTQCESPYTCTKLNDWYSQCL 526 



RESULT 10 
US-08-676-166A-3 

; Sequence 3, Application US/08676166A 
/ Patent No. 5955270 
; GENERAL INFORMATION: 

APPLICANT: Radford, Alan 

APPLICANT: Parish, John H. 

TITLE OF INVENTION: EXPLOITATION OF THE CELLULASE COMPLEX OF 
/ TITLE OF INVENTION: NEUROSPORA 

NUMBER OF SEQUENCES: 7 

CORRESPONDENCE ADDRESS: 
; ADDRESSEE: David A. Jackson, Esq. 

; STREET: 411 Hackensack Ave, Continental Plaza, 4th 

; STREET : Floor 

; CITY: Hackensack 

; STATE: New Jersey 

COUNTRY : USA 

ZIP: 07601 
; COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 
; OPERATING SYSTEM: PC-DOS/MS -DOS 

SOFTWARE: Patentin Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/676 , 166A 

FILING DATE: 15 -JUL- 1996 

CLASSIFICATION: 4 35 
; ATTORNEY/AGENT INFORMATION: 
; NAME: Jackson Esq., David A. 

REGISTRATION NUMBER: 26,742 
7 REFERENCE/DOCKET NUMBER: 1321-1-002 

; TELECOMMUNICATION INFORMATION: 

TELEPHONE: 201-487-5800 
; TELEFAX: 201-343-1684 

INFORMATION FOR SEQ ID NO : 3 : 
; SEQUENCE CHARACTERISTICS: 

LENGTH: 525 amino acids 
; TYPE: amino acid 

STRANDEDNESS : single 

TOPOLOGY: linear 
MOLECULE TYPE: protein 
HYPOTHETICAL : NO 
; ORIGINAL SOURCE: 

ORGANISM: H. grisea 
US-08-676-166A-3 




Query Match 60.5%; Score 1658; DB 1; Length 525; 

Best Local Similarity 57,5%; Pred. No. 1.4e-119; 

Matches 2 95; Conservative 77; Mismatches 119; Indels 22; Gaps 7; 



Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl :| II hhlh:|l I h :hlill|| : Mill II | ::: 

Db 19 QQACSLTTERHPSLSWKKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 

Qy 61 CPDNeI^CAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

I I ::|hllhlll I lllhlhhilh llh Ilhl Ml : II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQYSTNVGSRTYLMDGEDKYQT 138 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I IIMIhllllll : MIIIIIIIIIIIIMhhIl I llllllilllhlllll 

Db 139 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

i ^ I 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

Hlllhlhlll hh I I I MMIIIIIIIIIh:: I IIIIM :|l II 
Db 199 IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 240 GD^CGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

II IMIIh II I llllllhl II II Mil I hllllhllllll 

Db 259 GDSCGGTYSNERYAGVCDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITVVTQFLKDAN 316 

Qy 298 GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG'SSFSDKGGLTQFK 353 

I I hllhl : : II : hi II h llh I 

Db 317 GDLGEIKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II M lllllhllh Mllllllhl : : Ml Ihl hllllhlh:MI 
Db 377 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

Qy 414 AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 

I lllhlllllll I MUM II llh Mill M II 

Db 436 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 492 

Qy 465 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

: lllllhMII I II II MUM 

Db 493 RWQQCGGIGFTGPTQCEEPYTCTKLNDWYSQCL 525 



RESULT 16 
US-08-676-166A-2 

; Sequence 2, Application US/08676166A 
; Patent No. 5955270 
; GENERAL INFORMATION: 

APPLICANT: Radford, Alan 

APPLICANT: Parish, John H. 

TITLE OF INVENTION: EXPLOITATION OF THE CELLULASE COMPLEX OF 
TITLE OF INVENTION: NEUROSPORA 
NUMBER OF SEQUENCES: 7 
CORRESPONDENCE ADDRESS: 
; ADDRESSEE: David A. Jackson, Esq. 

STREET: 411 Hackensack Ave, Continental Plaza, 4th 
; STREET : Floor 

; CITY: Hackensack 

; STATE: New Jersey 

COUNTRY : USA 
ZIP: 07601 
; COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 
SOFTWARE: Patentin Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/676 , 166A 




FILING DATE: 15 -JUL- 1996 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION : 
NAME: Jackson Esq., David A. 
REGISTRATION NUMBER: 26,742 
REFERENCE/DOCKET NUMBER: 1321-1-002 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 201-4 87-5800 
TELEFAX: 201-343-1684 
INFORMATION FOR SEQ ID NO : 2: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 516 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-676-166A-2 

Query Match 57.0%; Score 1561; DB 1; Length 516; 

Best Local Similarity 57.5%; Pred. No. 4.2e-112; 

Matches 294; Conservative 62; Mismatches 129; Indels 26; Gaps 10; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I I II lllllh II I I ::|:|||||llll|: II II II I HI 

Db 18 QQAGTLTAKRHPSLTWQKCTRGGCPTLNT-TMVLDANWRWTHATSGSTKCYTGNKWQATL 76 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEF 120 

III :HI II MM I llhl II Ih: III Mill MM II II 

Db 77 CPDGKSCAANCALDGADYTGTYGITGSGWSLTLQFVTD NVGARAYLMADDTQYQML 132 

Qy 121 TLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDL 180 

11 I lllhl :|llllllll :||||lh mil Mill llllhllllll 
Db 133 ELLNQELWFDVDMSNIPCGLNGALYLSAMDADGGMRKYPTNKAGAKYATGYCDAQCPRDL 192 

c t ' 

Qy 181 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEG 240 

hill llllll Ihhil III IIMIIIIIIIIII :| I llllllh I :|M 
Db 193 KYINGIANVEGWTPSTNDAN-GIGDHGSCCSEMDIWEANKVSTAFTPHPCTTIEQHMCEG 251 

Qy 241 DGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA- 299 

I llllllhlll II lllhl Ihllhlll I hlh I MUM I 

Db 252 DSCGGTYSDDRYGVLCDADGCDFNSYRMGNTTFYGEGK--TVDTSSKFTVVTQFIKDSAG 309 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKK 354 

I MUM : : : III : M ::: II h MM I I 

Db 310 DLAEIKAFYVQNGKVIENSQSNVDGVSGNSITQSFCKSQKTAFGDIDDFNKKGGLKQMGK 369 

/ 

Qy 355 ATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNA 414 

I : MMIhllh lllllllllll Ml Ml hlMlhh: :|h 

Db 370 ALAQAMVLVMSIWDDHAANMLWLDSTYP VPKVPGAYRGSGPTTSGVPAEVDANAPNS 426 

Qy 415 KVTFSNIKFGPI GSTGNPSGGNPPGGNPPGTTTTRRPATTTGSSP-GPTQSHY 466 

II lllllll : Ihl I II I ::| : MM hi I M: 

Db 427 KVAFSNIKFGHLGISPFSGGSSGTPP-SNPSSSASPTSSTAKPSSTSTASNPSGTGAAHW 485 

Qy 467 GQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

MMIhllll I II : lllh 

Db 486 AQCGGIGFSGPTTCPEPYTCAKDHDIYSQCV 516 



RESULT 19 
US-09-329-350-35 

; Sequence 35, Application US/09329350 
; Patent No. 6184019 
; GENERAL INFORMATION: 

APPLICANT: Miettinen-Oinonen, Arja 
; APPLICANT: Londesborough, John 
; APPLICANT: Vehmaanper , Jari 

APPLICANT: Haakana, Heli 



APPLICANT: M ntyl , Arja 
APPLICANT: Lantto, Raija 
APPLICANT: Elovainio, Minna 
APPLICANT: Joutsjoki, Vesa 
APPLICANT: Paloheimo, Marja 
APPLICANT: Suominen, Pirkko 

TITLE OF INVENTION: NOVEL CELLULASES, THE GENES ENCODING THEM AND 
TITLE OF INVENTION: USES THEREOF 
NUMBER OF SEQUENCES: 45 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Sterne, Kessler, Goldstein & Fox P.L.L.C. 
STREET: 1100 New York Avenue, N.W., Suite 600 
CITY: Washington 
STATE : D . C . 
COUNTRY : USA 
ZIP: 20005 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Diskette, 3.50 inch 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patent In Release #1.0, Version #1.30 (EPO) 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 09/32 9 , 350 
FILING DATE: Herewith 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/841,636 
FILING DATE: 30 -APR- 1997 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/005,335 
FILING DATE: 17 -OCT- 1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/007,926 
FILING DATE: 04 -DEC- 1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/020,840 
FILING DATE: 28-JUN-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/732,181 
FILING DATE: 16-OCT-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/FI96/00550 
FILING DATE: 17 -OCT- 1996 
ATTORNEY/ AGENT INFORMATION: 
NAME: Shea Jr., Timothy 
REGISTRATION NUMBER: 41,3 06 

REFERENCE/DOCKET NUMBER: 1716 . 0510006/MAC/TJS 
TELECOMMUNICATION INFORMATION : 
TELEPHONE: (202 ) 371-2600 
TELEFAX: (202 ) 371-2540 
INFORMATION FOR SEQ ID NO: 35: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 452 amino acids 
TYPE: amino acid 
STRANDEDNESS : 
TOPOLOGY : 1 inear 
MOLECULE TYPE: protein 
ORIGINAL SOURCE: 

ORGANISM: Melanocarpus albomyces 
STRAIN: ALK04237 
FEATURE : 

NAME/KEY: Protein 
LOCATION: 1..452 

OTHER INFORMATION: /label= 50K-cellulase-B 
US-09-329-350-35 



Query Match 



44.5%; Score 1219; DB 2; Length 452; 



Best Local Similarity 51.9%; Pred. No. 8.2e-86; 

Matches 221; Conservative 76; Mismatches 117; Indels 12; Gaps 10; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTLCPDNEXCA 68 

I llllllhh: I I lllllllll I I mill h: I :|| 

Db 31 ENHPPLTWQRCTAPGNCQTVNAEVVIDANWRWIjHDDNMQ-NCYDGNQWTNA--CSTATDCA 88 

Qy 69 KNCCLDGAA-YASTYGVTTSGNSLSIGFVTQSAQ-KNVGARLYLMASDTTYQEFTLLGNE 126 

I ::|l I III :|l|::|:: llh llhl III II I hill 

Db 89 EKCMIEGAGDYLGTYGASTSGDALTLKFVTKHEYGTNVGSRFYLMNGPDKYQMFNLMGNE 148 

Qy 127 FSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQ 186 

Hllhl = Ihl llllhh llh: Ihl llhllllllhll llllh h 

Db 149 LAFDVDLSTVECGINSALYFVAMEEDGGMASYPSNQAGARYGTGYCDAQCARDLKFVGGK 208 

Qy 187 ANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGT 246 

Ihllh h:: I hi :||||:hhlhh : I III III HI MM 

Db 209 ANIEGWKSSTSDPNAGVGPYGSCCAEIDVWESNAYAFAFTPHACTTNEYHVCETTNCGGT 268 

Qy 247 YSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGAINRYYVQ 306 

lh:h I II MMMIIhll Ml I Mlh:| llhMI ::M::| 
Db 269 YSEDRFAGKCDANGCDYNPYRMGNPDFYGKGK--TLDTSRKFTWSRFE-ENKLSQYFIQ 325 

Qy 307 NG--VTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKKATSGGMVLV 363 

M : I II : :|: : h I : I : II I I MM 

Db 326 DGRKIEIPPPTWE-GMPNSSEITPELCSTMFDVFNDRNRFEEVGGFEQLNNALRVPMVLV 384 

Qy 364 MSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKF 423 

IhllhMMIIIII II I Ml II I I IMIhlhl hhl :Mhl 

Db 385 MSIWDDHYANMLWLDSIYPP-EKEGQPGAARGDCPTDSGVPAEVEAQFPDAQWWSNIRF 443 

Qy 424 GPIGST 429 

IIIIM 

Db 444 GPIGST 449 



RESULT 20 
US-08-841-636A-35 

Sequence 35, Application US/08841636A 
Patent No. 6723549 
GENERAL INFORMATION: 

APPLICANT: Miettinen-Oinonen, Arja 
APPLICANT: Londes borough, John 
APPLICANT: Vehmaanper , Jari 
APPLICANT: Haakana, Heli 
APPLICANT: M ntyl , Arja 
APPLICANT: Lantto, Raija 
APPLICANT: Elovainio, Minna 
APPLICANT: Joutsjoki, Vesa 
APPLICANT: Paloheimo, Marja 
APPLICANT : Suominen , Pirkko 

TITLE OF INVENTION: NOVEL CELLULASES, THE GENES ENCODING THEM AND 
TITLE OF INVENTION: USES THEREOF 
NUMBER OF SEQUENCES: 45 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Sterne, Kessler, Goldstein & Fox P.L.L.C. 
STREET: 1100 New York Avenue, N.W., Suite 600 
CITY : Washington 
STATE: D.C. 
COUNTRY : USA 
ZIP: 20005 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Diskette, 3.50 inch 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 

SOFTWARE: Patent In Release #1.0, Version #1.30 (EPO) 
CURRENT APPLICATION DATA: 



APPLICATION NUMBER: US/08/841 , 636A 
FILING DATE: 30-APR-1997 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/005,335 
FILING DATE: 17 -OCT- 1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/007,926 
; FILING DATE: 04 -DEC- 1995 

PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/020,840 
FILING DATE: 28-JUN-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/732,181 
FILING DATE: 16 -OCT- 1996 
; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/FI96/00550 
FILING DATE: 17-OCT-1996 
ATTORNEY/AGENT INFORMATION: 
NAME: Timothy J. Shea, Jr. 
REGISTRATION NUMBER: 41,306 
; REFERENCE/DOCKET NUMBER: 1716 . 0510005/MAC/TJS 

; TELECOMMUNICATION INFORMATION: 
TELEPHONE: (202)371-2600 
TELEFAX: (202)371-2540 
; INFORMATION FOR SEQ ID NO : 35: 
; SEQUENCE CHARACTERISTICS: 
; LENGTH: 452 amino acids 

; TYPE: amino acid 

STRANDEDNESS : 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
ORIGINAL SOURCE: 
/ ORGANISM: Melanocarpus albomyces 

STRAIN: ALK04237 
FEATURE : 

NAME/KEY: Protein 
LOCATION: 1. .4 52 

OTHER INFORMATION: /label= 50K-cellulase-B 
US-08-841-636A-35 

Query Match 44.5%; Score 1219; DB 2; Length 452; 

Best Local Similarity 51.9%; Pred. No. 8.2e-86; 

Matches 221; Conservative 76; Mismatches 117; Indels 12; Gaps 10 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTLCPDNEXCA 68 



I M M I M • I • • I I M M I M M I I I I M M I ' ' I Ml 

31 ENHPPLTWQRCTAPGNCQTVNAEWIDANWRWLHDDNMQ-NCYDGNQWTNA-CSTATDCA 88 




Db 



Qy 



69 KNCCLDGAA-YASTYGVTTSGNSLSIGFVTQSAQ-KNVGARLYLMASDTTYQEFTLLGNE 126 



Db 



• I • • M I I M • M I • • I • • I M • I I I ' I I M M I I M M 

89 EKCMIEGAGDYLGTYGASTSGDALTLKFVTKHEYGTNVGSRFYLMNGPDKYQMFNLMGNE 14 




Qy 



127 FSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQ 186 



Db 




Qy 



187 ANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGT 246 



Db 




Qy 



247 YSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGAINRYYVQ 306 



Db 




Qy 



307 NG--VTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKKATSGGMVLV 363 



Db 326 DGRKIEIPPPTWE-GMPNSSEITPELCSTMFDVFNDRNRFEEVGGFEQIjNNALRVPMVLV 384 



Qy 364 MSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKF 423 

Ihllhlllllllll II I III II I I llllhlhl hhl :|||:| 

Db 385 MSIWDDHYANMLWLDSIYPP-EKEGQPGAARGDCPTDSGVPAEVEAQFPDAQWWSNIRF 443 

Qy 424 GPIGST 429 

mill 

Db 444 GPIGST 449 



RESULT 32 
US-08-709-979A-1 

; Sequence 1, Application US/08709979A 
; Patent No. 5912157 
; GENERAL INFORMATION: 
; APPLICANT: Claus von der Osten 
APPLICANT: Martin Sch lein 

TITLE OF INVENTION: No. 5912157el Alkaline Cellulases 
NUMBER OF SEQUENCES: 7 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 59121570 No. 5912157disk of No. 5912157th America, Inc. 
; STREET: 405 Lexington Avenue, 64th Floor 

; CITY: New York 

STATE: New York 

COUNTRY: United States of America 

ZIP: 10174-6401 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: PatentIn Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/ 7 09, 97 9A 

FILING DATE: 09-SEP-1996 

CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
; NAME: Lambiris, Elias J. 

REGISTRATION NUMBER: 33,728 
; REFERENCE/DOCKET NUMBER: 4160.404-US 

; TELECOMMUNICATION INFORMATION: 
; TELEPHONE: 212-867-0123 

; TELEFAX: 212-878-9655 

/ INFORMATION FOR SEQ ID NO: 1: 

SEQUENCE CHARACTERISTICS: 
; LENGTH: 456 amino acids 

; TYPE: amino acid 

STRANDEDNESS : single 

TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-709-979A-1 



Query Match 28.0%; Score 767.5; DB 1; Length 456; 

Best Local Similarity 36.5%; Pred. No. 5.3e-51; 

Matches 173; Conservative 71; Mismatches 161; Indels 69; Gaps 16; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 

I II H :|: I ::| :|:|| I II II : I III hi 

Db 28 EVHPQITTYRCTKADGCEEKTNYIVLDALSHPVHQVDNPYNCGDWGQKPNETACPDLESC 87 

Qy 68 AKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKN-VGARLYLMASDTT YQEFTLL 123 

hi! :| : :||:| I jj : I II hlh I I h | 

Db 88 ARNCIMDPVSDYGRHGVSTDGTSLRL KQLVGGNWSPRVYLL- -DETKERYEMLKLT 142 

Qy 124 GNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRDL 180 

lllhllll ::||||:| Ml III I h III Hllllhll | 
Db 143 GNEFTFDVDATKLPCGMNSALYLSEMDATGARSE--LNPGGATFGTGYCDAQCYVTP 197 



I. 1/ / 

Qy 181 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEG 240 

MM hi I hMMMMMh :: : MM: I :||| 

Db 198 -FINGLGNIE GKGACCNEMDIWEANARAQHIAPHPCSKAGPYLCEG 242 

I 

Qy 241 DGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG-- 298 

I : I M Ml llllh I Ml h I Mlh MUM I 

Db 243 AEC EFDGVCDKNGCAWNPYRVNVTDYYGEGAEFRVDTTRPFSWTQFRAGGDA 295 

Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQ 351 

M I Mhl : : I : : hM I I : I- I : 

Db 296 GGGKLESIYRLFVQDGRVIESYWDKPGIiPPTDRMTDEFCAAT GAARFTELGAMEA 351 

Qy 352 FKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQS 411 

I : MM MM M MM II I h : 

Db 352 MGDALTRGMVLALSIWWSEGDNMNWLDS GEAGPCDPDEGNPSNIIRVQ 399 

Qy 412 ^NAKVTFSNIKFGPIGSTGNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQSH 465 

h M llh:M MM I : M I : M I I M: 
Db 400 PDPEWFSNLRWGEIGST-YESAVDGPVGKGKGKGKGKAPA GDGNGKEKSN 449 



RESULT 33 
US-08-709-974A-11 

Sequence 11, Application US/08709974A 
Patent No. 6117664 
GENERAL INFORMATION: 

APPLICANT: Sch lein, Martin 
APPLICANT: Rosholm, Peter 
APPLICANT: Nielsen, Jack Bech 
APPLICANT: Hansen, Svend Aage 
APPLICANT: von der Osten,Claus 

TITLE OF INVENTION: No. 6117664el Alkaline Cellulases 

NUMBER OF SEQUENCES: 11 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 61176640 No. 6117664disk of No. 6117664th America, Inc. 
STREET: 405 Lexington Avenue, 64th Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 
ZIP: 10174-6401 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC -DOS/MS -DOS 
SOFTWARE: Patentin Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/709 , 974A 
FILING DATE: 0 9-SEP-1996 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION : 
NAME: Gregg, Valeta 
REGISTRATION NUMBER: 35,127y 
REFERENCE/DOCKET NUMBER: 4 160. 414 -US 
TELECOMMUNICATION INFORMATION : 
TELEPHONE: 212-867-0123 
TELEFAX: 212-878-9655 
INFORMATION FOR SEQ ID NO : 11: 
SEQUENCE CH/OyVCTERISTICS : 
LENGTH: 456 amino acids 
TYPE: amino acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-709-974A-11 



Query Match 



27.8%; Score 762.5; DB 2; Length 456; 



Best Local Similarity 36.3%; Pred. No. 1.3e-50; 

Matches 172; Conservative 72; Mismatches 161; Indels 69; Gaps 16; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 

I II :| :h I ::| :|:|| I II II : MM hi 

Db 28 EVHPQITTYRCTKADGCEEKTNYIVLDALSHPVHQVDNPYNCGDWGQKPNETACPDLESC 87 

Qy 68 AKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKN-VGARLYLMASDTT YQEFTLL 123 

hll :| : :||:| I II : I II hlh I I h I 

Db 88 ARNCIMDPVSDYGRHGVSTDGTSLRL KQLVGGNWSPRVYLL- -DETKERYEMLKLT 142 

Qy 124 GNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRDL 180 

lllhllll ::|ll|:| III III I h III Hllllhll I 
Db 143 GNEFTFDVDATKLPCGMNSALYLSEMDATGARSE--LNPGGATFGTGYCDAQCYVTP 197 

Qy 181 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEG 240 

MM hi I hlhlllllllh : lllh I MM 

Db 198 -FINGLGNIE GKGACCNEMDIWEANARAQHIAPHPCSKAGPYLCEG 242 

i 

Qy 241 DGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG-- 298 

I MM Ml Mllh I Ml h I Mlh MUM I 

Db 243 AEC EFDGVCDKNGCAWNPYRVNVTDYYGEGAEFRVDTTRPFSWTQFRAGGDA 295 

Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQ 351 

M I MIM : : I : : hM I I : I- I : 

Db 296 GGGKLESIYRLFVQDGRVIESYWDKPGLPPTDRMTDEFCAAT GAARFTELGAMEA 351 

Qy 352 FKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQS 411 

I : MM MM M MM M I h : 

Db 352 MGDALTRGMVLALSIWWSEGNDMNWLDS GEAGPCDPDEGNPSNIIRVQ 399 

/ 

Qy 412 PNAKVTFSNIKFGPIGSTGNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQSH 465 

h M Mh:M MM I : II I : M I I M: 
Db 400 PDPEWFSNLRWGEIGST- YES AVDGPVGKGKGKGKGKAPA GDGNGKEKSN 449 



RESULT 34 
US-09-329-350-33 

Sequence 33, Application US/09329350 
Patent No. 6184019 
GENERAL INFORMATION: 

APPLICANT: Miettinen-Oinonen, Arja 
APPLICANT : Londesborough , John 
APPLICANT: Vehmaanper , Jari 
APPLICANT: Haakana, Heli 
APPLICANT: M ntyl , Arja 
APPLICANT: Lantto, Raija 
APPLICANT: Elovainio, Minna 
APPLICANT: Joutsjoki, Vesa 
APPLICANT: Paloheimo, Marja 
APPLICANT: Suominen, Pirkko 

TITLE OF INVENTION: NOVEL CELLULASES, THE GENES ENCODING THEM AND 
TITLE OF INVENTION: USES THEREOF 
NUMBER OF SEQUENCES: 45 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Sterne, Kessler, Goldstein & Fox P.L.L.C. 
STREET: 1100 New York Avenue, N.W., Suite 600 
CITY: Washington 
STATE: D.C. 
COUNTRY : USA 
ZIP: 20005 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Diskette, 3.50 inch 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 

SOFTWARE: Patentin Release #1.0, Version #1.30 (EPO) 
CURRENT APPLICATION DATA: 



APPLICATION NUMBER: US/09/329, 350 

FILING DATE: Herewith 

CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/841,636 

FILING DATE: 30-APR-1997 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/005,335 

FILING DATE: 17 -OCT- 1995 
; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/007,926 
; FILING DATE: 04 -DEC- 1995 

; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/020,840 

FILING DATE: 28-JUN-1996 
; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/732,181 

FILING DATE: 16 -OCT- 1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/FI96/00550 

FILING DATE: 17-OCT-1996 
ATTORNEY/AGENT INFORMATION: 

NAME : Shea Jr . , Timothy 

REGISTRATION NUMBER: 41,306 

REFERENCE /DOCKET NUMBER: 1716 . 0510006/MAC/TJS 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (2 02)371-2600 
TELEFAX: (202)371-2540 
; INFORMATION FOR SEQ ID NO: 33: 

SEQUENCE CHARACTERISTICS: 
; LENGTH: 428 amino acids 

; TYPE: amino acid 

STRANDEDNESS : 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
; ORIGINAL SOURCE: 

; ORGANISM: Melanocarpus albomyces 

STRAIN: ALK04237 
/ FEATURE : 

NAME/KEY: Protein 

LOCATION: 1..42 8 

OTHER INFORMATION: /label= 50K-cellulase 
US-09-329-350-33 



Query Match 27.6%; Score 757; DB 2; Length 428; 

Best Local Similarity 38.4%; Pred. No. 3.1e-50; 

Matches 166; Conservative . 59; Mismatches 151; Indels 56; Gaps 12; 

/ 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 

I II II :h I :| :|:|: I :: II II ::| III hi 

Db 28 EVHPQLTTPRCTKADGCQPRTNYIVLDSLSHPVHQVDNDYNCGDWGQKPNATACPDVESC 87 

Qy 68 AKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLM-ASDTTYQEFTLLGNE 12 6 

hll -I I HUM II : : : I hlh - h III 

Db 88 ARNCIMEGVPDYSQHGVTTSDTSLRLQQLVDG--RLVTPRVYLLDETEHRYEMMHLTGQE 145 



Qy 127 FSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRDLKFI 183 

hhll ::||lhl III Mil: I || llllllhll | || 
Db 146 FTFEVDATKLPCGMNSALYLSEMDPTGARSE- -LNPGGAYYGTGYCDAQCFVTP FI 199 

../. . ( }. f 

Qy 184 NGQANVEGWElPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGC 243 

II hi I lllhlllllllll : : II I I :||| I 

Db 200 NGIGNIE GKGSCCNEMDIWEANSRATHVAPHTCNQTGLYMCEGAEC 245 



Qy 244 GGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG 298 

I I II III llllh I :|| :| :|| : |||||l 
Db 246 EYDGVCDKDGCGWNPYRVNITDYYGNSDAFRVDTRRPFTWTQFPADAEGRLE 298 



Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATS 357 

llhl : : I : |||::| | | : : | || I : 

Db 299 SIHRLYVQDGKVIESYWDAPGLPRTDSLNDEFCAAT GAARYLDLGGTAGMGDAMT 354 

/ 

Qy 358 GGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVT 417 

llll Ihl I I llll II I I : h :|| 

Db 355 RGMVLAMSIWWDESGFMNWLDS GEAGPCLPDEGDPKNIVKVEPSPEVT 402 

Qy 418 FSNIKFGPIGST 429 

:||:::| llll 
Db 403 YSNLRWGEIGST 414 



RESULT 35 
US-08-841-636A-33 

Secjuence 33, Application US/08841636A 
Patent No. 6723549 
GENERAL INFORMATION: 

APPLICANT: Miettinen-Oinonen, Arja 
APPLICANT : Londesborough , John 
APPLICANT: Vehmaanper , Jari 
APPLICANT: Haakana, Heli 
APPLICANT: M ntyl , Arja 
APPLICANT: Lantto, Raija 
APPLICANT: Elovainio, Minna 
APPLICANT: Joutsjoki, Vesa 
APPLICANT: Paloheimo, Marja 
APPLICANT : Suominen , Pirkko 

TITLE OF INVENTION: NOVEL CELLULASES, THE GENES ENCODING THEM AND 
TITLE OF INVENTION: USES THEREOF 
NUMBER OF SEQUENCES: 45 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Sterne, Kessler, Goldstein & Fox P.L.L.C. 
STREET: 1100 New York Avenue, N.W. , Suite 600 
CITY: Washington 
STATE: D.C. 
COUNTRY : USA 
ZIP: 20005 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Diskette, 3.50 inch 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentin Release #1.0, Version #1.30 (EPO) 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/841 , 636A 
FILING DATE: 30 -APR- 19 97 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/005,335 
FILING DATE: 17-OCT-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/007,926 
FILING DATE: 04-DEC-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/020,840 
FILING DATE: 28-JUN-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/732,181 
FILING DATE: 16 -OCT- 1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/FI 96/ 00550 
FILING DATE: 17-OCT-1996 
ATTORNEY/ AGENT INFORMATION: 
NAME: Timothy J. Shea, Jr. 
REGISTRATION NUMBER: 41,306 

REFERENCE/DOCKET NUMBER: 1716 . 0510005/MAC/TJS 



TELECOMMUNICATION INFORMATION : 
TELEPHONE: (202 ) 371-2600 
TELEFAX: (202) 371-2540 
INFORMATION FOR SEQ ID NO: 33: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 428 amino acids 
TYPE: amino acid 
STRANDEDNESS : 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
ORIGINAL SOURCE: 

ORGANISM: Melanocarpus albomyces 
STRAIN: ALK04237 
FEATURE : 

NAME/KEY: Protein 
LOCATION: 1..428 

OTHER INFORMATION: /label= 50K-cellulase 
US-08-841-636A-33 

Query Match 27.6%; Score 757; DB 2; Length 428; 

Best Local Similarity 38.4%; Pred. No. 3.1e-50; 

Matches 166; Conservative 59; Mismatches 151; Indels 56; Gaps 12; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 

I II II :h I :| :|:|: I II II ::| III hi 

Db 28 EVHPQLTTFRCTKADGCQPRTNYIVLDSLSHPVHQVDNDYNCGDWGQKPNATACPDVESC 87 

Qy 68 AKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLM-ASDTTYQEFTLLGNE 126 

hll -I I :|llll II : : : I hlh - h III 

Db 88 ARNCIMEGVPDySQHGVTTSDTSLRLQQLVDG--RLVTPRVYLLDETEHRYEMMHLTGQE 145 

Qy 127 FSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC---PRDLKFI 183 

hhll ::|ll|:| III II I h I M llllllhll I II 
Db 146 FTFEVDATKLPCGMNSALYLSEMDPTGARSE--LNPGGAYYGTGYCDAQCFVTP FI 199 

Qy 184 NGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGC 243 

II hi I lllhlllllllM : : II I I :|ll I 

Db 200 NGIGNIE GKGSCCNEMDIWEANSRATHVAPHTCNQTGLYMCEGAEC 245 

Qy 244 GGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG 298 

I I II III llllh I :|| :| :|| : llllll 
Db 24 6 EYDGVCDKDGCGWNPYRVNITDYYGNSDAFRVDTRRPFTWTQFPADAEGRLE 298 

Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATS 357 

H:| llhl : : I : Ml-I I I : : I II I : 

Db 299 SIHRLYVQDGKVIESYWDAPGLPRTDSLNDEFCAAT GAARYLDLGGTAGMGDAMT 354 

Qy 358 GGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVT 417 

MM Ihl I I MM II I I : |: Ml 

Db 355 RGMVLAMSIWWDESGFMNWLDS GEAGPCLPDEGDPKNIVKVEPSPEVT 402 

Qy 418 FSNIKFGPIGST 429 

Mh:M MM 

Db 403 YSNLRWGEIGST 414 



RESULT 36 
US-08-709-974A-3 

Sequence 3, Application US/08709974A 
Patent No. 6117664 
GENERAL INFORMATION: 

APPLICANT: Sch lein, Martin 
APPLICANT: Rosholm, Peter 
APPLICANT: Nielsen, Jack Bech 
APPLICANT: Hansen, Svend Aage 
APPLICANT: von der Osten,Claus 

TITLE OF INVENTION: No. 6117664el Alkaline Cellulases 



NUMBER OF SEQUENCES: 11 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 61176640 No. 6117664disk of No. 6117664th America, Inc. 
; STREET: 405 Lexington Avenue, 64th Floor 

CITY: New York 
; STATE: New York 

; COUNTRY: United States of America 

ZIP: 10174-6401 
COMPUTER READABLE FORM: 
; MEDIUM TYPE: Floppy disk 

; COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patent In Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/709 , 974A 
; FILING DATE: 09-SEP-1996 

CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Gregg, Valet a 
REGISTRATION NUMBER: 35,127y 
REFERENCE/DOCKET NUMBER: 4 160. 4 14 -US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-0123 
TELEFAX: 212-878-9655 
INFORMATION FOR SEQ ID NO : 3: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 4 09 amino acids 
; TYPE: amino acid 

; STRANDEDNESS : single 

TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-709-974A-3 

Query Match 27.4%; Score 750.5; DB 2; Length 409; 

Best Local Similarity 38.3%; Pred. No. 9.3e-50; 

Matches 171; Conservative 51; Mismatches 160; Indels 65; Gaps 15; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 

I II I I :|| :| II I I I I I ::| III :| 

Db 8 EQHPKLETYRCTKASGCKKQTNYIVADAG---IHGIRRSAGCGDWGQKPNATACPDEASC 64 

Qy 68 AKNCCLDGA AYASTYGVTTSGNSLSIGFVTQSAQKN- -VGARLYLMASD-TTYQEFT 121 

INI II II : hMIII I : I II |:||: : h 

Db 65 AKNCILSGMDSNAYKNA-GITTSGNKLRL QQL INNQLVSPRVYLLEENKKKYEMLH 119 

Qy 122 LLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PR 178 

I I lllllh: :||||:|llll I INI : III II lllhll I 

Db 120 LTGTEFSFDVEMEKLPCGMNGALYLSEMPQDGGKSTSRNSKAGAYYGAGYCDAQCYVTP- 178 

. .... ^,. . I 

Qy 179 DLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEIC 238 

INI h: I I IhhIIIIIII : : lll|: I I 

Db 179 FINGVGNIK GQGVCCNELDIWEANSRATHIAPHPCSKPGLYGC 221 

J I 

Qy 239 EGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG 298 

II m I II II II |: I III I : :h|:| II HI : 

Db 222 TGDECGSS GICDKAGCGWNHNRINVTDFYGRGKQYKVDSTRKFTVTSQFVANK 274 

Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

-hhh : : I I :|| III | : : || | 

Db 275 QGDLIELHRHYIQDNKVIESAWNISGPPKINFINDKYCAAT GANEYMRLGGTKQM 330 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

I I INI Ihl I III I I I : I I : I 

Db 331 GDAMSRGMVLAMSVWWSEGDFMAWLDQ GVAGPCDATEGDPKNIVKVQP 3 78 

Qy 413 NAKVTFSNIKFGPIGSTGNPSGGNPPG 439 

I HIMII: I Mil : II 



Db 379 NPEVTFSNIRIGEIGSTSSVKAPAYPG 405 



RESULT 37 
US-09-069-632-2 

; Sequence 2, Application US/09069632 
; Patent No. 6261828 
; GENERAL INFORMATION: 

APPLICANT: Lund, Henrik 
; TITLE OF INVENTION: A Process For Combined Desizing 
TITLE OF INVENTION: And Stone-Washing of Dyed Denim 
NUMBER OF SEQUENCES: 3 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 62618280 No. 6261828disk of No. 6261828th America, Inc. 
; STREET: 4 05 Lexington Avenue 

CITY: New York 
STATE : NY 
COUNTRY : USA 
ZIP: 10174 
COMPUTER READABLE FORM: 
MEDIUM TYPE: Diskette 
COMPUTER: IBM Compatible 
OPERATING SYSTEM: DOS 

SOFTWARE: FastSEQ for Windows Version 2.0 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/069 , 632 
; FILING DATE: 29-APR-1998 

CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/DK96/00469 

FILING DATE: 15 -NOV- 1996 

APPLICATION NUMBER: 1278/95 

FILING DATE: 15 -NOV- 1995 
ATTORNEY/AGENT INFORMATION: 
; NAME: Gregg, Valeta 

REGISTRATION NUMBER: 35,127 

REFERENCE/DOCKET NUMBER: 458 8. 2 04 -US 
TELECOMMUNICATION INFORMATION: 
; TELEPHONE: 212-867-0123 

; TELEFAX: 212-878-9655 

; INFORMATION FOR SEQ ID NO : 2: 
; SEQUENCE CHARACTERISTICS: 

; LENGTH: 409 amino acids 

; TYPE: amino acid 

STRANDEDNESS : single 

TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-09-069-632-2 



Query Match 27.4%; Score 750.5; DB 2; Length 409; 

Best Local Similarity 38.3%; Pred. No. 9.3e-50; 

Matches 171; Conservative 51; Mismatches 160; Indels 65; Gaps 15; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 

I II I :h I :|l :| II I | | || ::j ||| :| 

Db 8 EQHPKLETYRCTKASGCKKQTNYIVADAG IHGIRRSAGCGDWGQKPNATACPDEASC 64 



Qy 68 AKNCCLDGA AYASTYGVTTSGNSLSIGFVTQSAQKN- -VGARLYLMASD-TTYQEFT 121 

Illl II W hlllll I : I II |:||: : h 

Db 65 AKNCILSGMDSNAYKNA-GITTSGNKLRL QQLINNQLVSPRVYLLEENKKKYEMLH 119 

Qy 122 LLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PR 178 

1 I lllllh: :|llhlllll I Illl : ||| II Mihil I 

Db 120 LTGTEFSFDVEMEKLPCGMNGALYLSEMPQDGGKSTSRNSKAGAYYGAGYCDAQCYVTP- 178 

I / 

Qy 179 DLKFINGqANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEIC 238 



Db ' 179 ---FINGVGNIK GQGVCCNELDIWEANSRATHIAPHPCSKPGLYGC 221 

7 I 

Qy 23 9 EGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG 298 

Nil: I II II 11 |: I III I : :|:|:| II HI : 

Db 222 TGDECGSS GICDKAGCGWNHNRINVTDFYGRGKQYKVDSTRKFTVTSQFVANK 274 

Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

-hhh : : I I :||lll I : ^ II I 

Db 275 QGDLIELHRHYIQDNKVIESAWNISGPPKINFINDKYCAAT GANEYMRLGGTKQM 330 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

I I MM MM I III I I I : II : I 

Db 331 GDAMSRGMVLAMSVWWSEGDFMAWLDQ GVAGPCDATEGDPKNIVKVQP 378 

Qy 413 NAKVTFSNIKFGPIGSTGNPSGGNPPG 439 

I Mllllh I MM : II 
Db 379 NPEVTFSNIRIGEIGSTSSVKAPAYPG 405 



RESULT 38 
US-08-361-920-25 

Sequence 25, Application US/08361920 
Patent No. 5457046 
GENERAL INFORMATION: 

APPLICANT: Woeldike, Helle F. 
APPLICANT: Hagen, Frederick 
APPLICANT: Hjort, Carsten M. 
APPLICANT: Sven, Hastrup 

TITLE OF INVENTION: An Enzyme Capable of Degrading Cellulose 
TITLE OF INVENTION: or Hemicellulose 
NUMBER OF SEQUENCES: 85 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 5457046O No. 5457046disk of No. 5457046th America, Inc. 
STREET: 405 Lexington Avenue, 62nd Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 

ZIP: 10174-6201 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patent In Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/361,920 
FILING DATE: 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 07/940,860 
FILING DATE: 28-OCT-1992 
APPLICATION NUMBER: DK 1158/90 
FILING DATE: 09-MAY-1990 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/DK91/00124 
FILING DATE: 08-MAY-1991 
ATTORNEY/ AGENT INFORMATION : 
NAME: Lambiris, Elias J. 
REGISTRATION NUMBER: 33,728 
REFERENCE/DOCKET NUMBER: 3435. 2 04 -US 
TELECOMMUNICATION INFORMATION: 
TELE PHONE : 212-867-0123 
TELEFAX: 212-867-02 98 
INFORMATION FOR SEQ ID NO: 25: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 427 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 



MOLECULE TYPE: protein 
US-08-361-920-25 



Query Match 27.4%; Score 750.5; DB 1; Length 427; 

Best Local Similarity 38.3%; Pred. No. 9.9e~50; 

Matches 171; Conservative 51; Mismatches 160; Indels 65; Gaps 15; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDAimRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 

I II I :h I :|| :| II I I I I I \ III H 

Db 26 EQHPKLETYRCTKASGCKKQTNYIVADAG IHGIRRSAGCGDWGQKPNATACPDEASC 82 

Qy 68 AKNCCLDGA AYASTYGVTTSGNSLSIGFVTQSAQKN- -VGARLYLMASD-TTYQEFT 121 

MM II M : hlllll I : I II hlh : h 

Db 83 AKNCILSGMDSNAYKNA-GITTSGNKLRL QQLINNQLVSPRVYLLEENKKKYEMLQ 137 

Qy 122 LLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PR 178 

I I lllllh: :||l|:|illl I MM : III II lllhll I 

Db 138 LTGTEFSFDVEMEKLPCGMNGALYLSEMPQDGGKSTSRNSKAGAYYGAGYCDAQCYVTP- 196 

Qy 179 DLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEIC 238 

MM |:: I I IhhIIIIIII : : lllh I i 

Db 197 ---FINGVGNIK GQGVCCNELDIWEANSRATHIAPHPCSKPGLYGC 239 

Qy 239 EGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSG 2 98 

Mil: I II II II h I III I : :h|:| II HI : 

Db 240 TGDECGSS GFCDKAGCGWNHNRINVTDFYGRGKQYKVDSTRKFTVTSQFVANK 2 92 

Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

-hhh : : I I :M II I I : : II I 

Db 293 QGDLIELHRHYIQDNKVIESAWNISGPPKINFINDKYCAAT GANEYMRLGGTKQM 348 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

I I MM Ihl Mil I M : II : I 

Db 34 9 GDAMSRGMVLAMSVWWSEGDFMAWLDQ GVAGPCDATEGDPKNIVKVQP 396 

Qy 413 NAKVTFSNIKFGPIGSTGNPSGGNPPG 439 

I Mllllh I MM : II 
Db 397 NPEVTFSNIRIGEIGSTSSVKAPAYPG 423 



RESULT 39 
US-08-479-939-25 

Sequence 25, Application US/08479939 
Patent No. 5686593 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Woeldike, Helle F. 
Hagen, Frederick 
Hjort, Cars ten M. 
Sven, Hastrup 

TITLE OF INVENTION: An Enzyme Capable of Degrading Cellulose 
TITLE OF INVENTION: or Hemicellulose 
NUMBER OF SEQUENCES: 85 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 56865930 No. 5686593disk of No. 5686593th America, Inc. 

STREET: 405 Lexington Avenue, 62nd Floor 

CITY: New York 

STATE: New York 

COUNTRY: United States of America 

ZIP: 10174-6201 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patent In Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/479,93 9 

FILING DATE: 07-JUN-1995 



CLAS S I F I CATION : 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/08/361,920 
FILING DATE: 22-DEC-1994 
APPLICATION NUMBER: US 07/940,860 
FILING DATE: 28-OCT-1992 
APPLICATION NUMBER: DK 1158/90 
FILING DATE: 09 -MAY- 1990 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/DK91/ 00124 
FILING DATE: 08-MAY-1991 
ATTORNEY/AGENT INFORMATION: 
NAME: Lambiris, Ellas J. 
REGISTRATION NUMBER: 33,728 
REFERENCE/DOCKET NUMBER: 3435.204-US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-0123 
TELEFAX: 212-867-0298 
INFORMATION FOR SEQ ID NO: 25: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 427 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-479-939-25 

Query Match 27.4%; Score 750.5; DB 1; Length 427; 

Best Local Similarity 38.3%; Pred. No. 9.9e-50; 

Matches 171; Conservative 51; Mismatches 160; Indels 65; Gaps 15; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 

I II I H: I :|| :| II I I M I : = ! Ill H 

Db 26 EQHPKLETYRCTKASGCKKQTNYIVADAG IHGIRRSAGCGDWGQKPNATACPDEASC 82 

Qy 68 AKNCCLDGA AYASTYGVTTSGNSLSIGFVTQSAQKN- -VGARLYLMASD-TTYQEFT 121 

MM II II : hlllll I : I II |:||: : h 

Db 83 AKNCILSGMDSNAYKNA-GITTSGNKLRL QQLINNQLVSPRVYLLEENKKKYEMLQ 137 

Qy 122 LLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PR 178 

I I Mlllh: Mllhlllll I III I : III II nihil I 

Db 138 LTGTEFSFDVEMEKLPCGMNGALYLSEMPQDGGKSTSRNSKAGAYYGAGYCDAQCYVTP- 196 

Qy 179 DLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEIC 238 

MM h: I I IhhIIIIIII : : lll|: I I 

Db 197 ---FINGVGNIK GQGVCCNELDIWEANSRATHIAPHPCSKPGLYGC 239 

Qy 239 EGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG 298 

WW'- I II II II h I III I : :|:hl II Ml : 

Db 240 TGDECGSS GFCDKAGCGWNHNRINVTDFYGRGKQYKVDSTRKFTVTSQFVANK 292 

Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

-hhh : : I I Ml II I I : : II I 

Db 293 QGDLIELHRHYIQDNKVIESAWNISGPPKINFINDKYCAAT GANEYMRLGGTKQM 348 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

I I MM MM I Ml I I I : I I : I 

Db 349 GDAMSRGMVLAMSVWWSEGDFMAWLDQ GVAGPCDATEGDPKNIVKVQP 396 

Qy 413 NAKVTFSNIKFGPIGSTGNPSGGNPPG 439 

I Mllllh I MM : II 
Db 397 NPEVTFSNIRIGEIGSTSSVKAPAYPG 423 



RESULT 40 

US-08-483-432-25 

; Sequence 25, Application US/08483432 
; Patent No. 5763254 



; GENERAL INFORMATION: 

APPLICANT: Woeldike, Helle F. 
/ APPLICANT: Hagen, Frederick 
APPLICANT: Hjort, Cars ten M. 
; APPLICANT: Sven, Hastrup 

TITLE OF INVENTION: An Enzyme Capable of Degrading Cellulose 
TITLE OF INVENTION: or Hemicellulose 
; NUMBER OF SEQUENCES: 85 
; CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 57632540 No. 5763254disk of No. 5763254th America, Inc. 
; STREET: 405 Lexington Avenue, 62nd Floor 

; CITY: New York 

; STATE: New York 

COUNTRY: United States of America 
ZIP: 10174-6201 
; COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 
SOFTWARE: Patentin Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/483,432 
FILING DATE: 07-JUN-1995 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/08/361 , 920 
FILING DATE: 

APPLICATION NUMBER: US 07/940,860 

FILING DATE: 28-OCT-1992 

APPLICATION NUMBER: DK 1158/90 
; FILING DATE: 09-MAY-1990 

PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/DK91/00124 

FILING DATE: 08-MAY-1991 
ATTORNEY/AGENT INFORMATION: 
; NAME: Lambiris, Elias J. 

REGISTRATION NUMBER: 33,728 

REFERENCE /DOCKET NUMBER: 3435.204-US 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: 212-867-0123 
; TELEFAX: 212-867-0298 

; INFORMATION FOR SEQ ID NO: 25: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 427 amino acids 
; TYPE: amino acid 

TOPOLOGY: linear 
; MOLECULE TYPE: protein 
US-08-483-432-25 

Query Match 27.4%; Score 750.5; DB 1; Length 427; 

Best Local Similarity 38.3%; Pred. No. 9.9e-50; 

Matches 171; Conservative 51; Mismatches 160; Indels 65; Gaps 15; 
Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 



Db 



2 6 EQHPKLETYRCTKASGCKKQTNYI VADAG - - - IHGIRRSAGCGDWGQKPNATACPDEASC 




Qy 



68 AKNCCLDGA AYASTYGVTTSGNSLSIGFVTQSAQKN- -VGARLYLMASD-TTYQEFT 121 



Db 




Qy 



122 LLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PR 178 



Db 




Qy 



179 DLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEIC 238 



Db 197 ---FINGVGNIK GQGVCCNELDIWEANSRATHIAPHPCSKPGLYGC 239 



Qy 239 EGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG 298 

II II : I II II II h I III I : :|:hl II HI 

Db 240 TGDECGSS GFCDKAGCGWNHNRINVTDFYGRGKQYKVDSTRKFTVTSQFVANK 292 

Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

-hhh : : I I :|| III I : = II I 

Db 293 QGDLIELHRHYIQDNKVIESAWNISGPPKINFINDKYCAAT GANEYMRLGGTKQM 348 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

I I MM MM I Ml I I I : I I : I 

Db 349 GDAMSRGMVLAMSVWWSEGDFMAWLDQ GVAGPCDATEGDPKNIVKVQP 396 

Qy 413 NAKVTFSNIKFGPIGSTGNPSGGNPPG 439 

I Mllllh I MM : II 
Db 397 NPEVTFSNIRIGEIGSTSSVKAPAYPG 423 



RESULT 41 
US-08-709-974A-6 

Sequence 6, Application US/08709974A 
Patent No. 6117664 
GENERAL INFORMATION: 

APPLICANT: Sch lein, Martin 
APPLICANT: Rosholm, Peter 
APPLICANT: Nielsen, Jack Bech 
APPLICANT: Hansen, Svend Aage 
APPLICANT: von der Osten,Claus 

TITLE OF INVENTION: No. 6117664el Alkaline Cellulases 
NUMBER OF SEQUENCES: 11 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 61176640 No. 6117664disk of No. 6117664th America, Inc. 
STREET: 405 Lexington Avenue, 64th Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 
ZIP: 10174-6401 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 
SOFTWARE: Patent In Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/709 , 974A 
FILING DATE: 09-SEP-1996 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Gregg, Valeta 
REGISTRATION NUMBER: 35,127y 
REFERENCE/DOCKET NUMBER: 4 160. 4 14 -US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-0123 
TELEFAX: 212-878-9655 
INFORMATION FOR SEQ ID NO: 6: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 411 amino acids 
TYPE: amino acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-709-974A-6 

Query Match 27.1%; Score 741.5; DB 2; Length 411; 

Best Local Similarity 38.0%; Pred. No. 4.6e-49; 

Matches 170; Conservative 52; Mismatches 162; Indels 63; Gaps 15; 



Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNEXC 67 

I II I I :|| :| II Mill :U III H 

Db 8 EQHPKLETYRCTKASGCKKQTNYIVADAGIHGIRQKNGA-GCGDWGQKPNATACPDEASC 66 

Qy 68 AKNCCLDGA AYASTYGVTTSGNSLSIGFVTQSAQKN- -VGARLYLMASD-TTYQEFT 121 

MM II II : hlMII I : I II hlh : h 

Db 67 AKNCILSGMDSNAYKNA-GITTSGNKLRL QQLINNQLVSPRVYLLEENKKKYEMLH 121 

Qy 122 LLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PR 178 

I I Mlllh: MllhlMII i III I : III II nihil I 

Db 122 LTGTEFSFDVEMEKLPCGMNGAIiYLSEMPQDGGKSTSRNSKAGAYYGAGYCDAQCYVTP- 180 

Qy 179 DLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEIC 238 

MM I I IhhIMMM : : lllh I I 

Db 181 ---FINGVGNIK GQGVCCNELDIWEANSRATHIAPHPCSKPGLYGC 223 

Qy 239 EGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSG 298 

MM: I II II II |: llll I : :|:|:| || :|| : 

Db 224 TGDECGSS GICDKAGCGWNHNRINVTDFYGRGKQYKVDSTRKFTVTSQFVANK 276 

Qy 2 99 AINRYYVQNGVTFQQPNAEL-GSYSGNEIiNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

-hhh : : I I :|| II I | : : |1 | 

Db 277 QGDLIELHRHYIQDNKVIESAWNISGPPKINFINDKYCAAT GANEYMRLGGTKQM 332 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

I I llll MM llll I I I : 1 I : I 

Db 333 GDAMSRGMVLAMSVWWSEGDFMAWLDQ GVAGPCDATEGDPKNIVKVQP 380 

Qy 413 NAKVTFSNIKFGPIGSTGNPSGGNPPG 43 9 

I MIIMh I MM : II 

Db 381 NPEVTFSNIRIGEIGSTSSVKAPAYPG 4 07 



RESULT 42 
US-09-069-632-1 

; Sequence 1, Application US/09069632 

; Patent No. 6261828 

; GENERAL INFORMATION: 

APPLICANT: Lund, Henrik 
; TITLE OF INVENTION: A Process For Combined Desizing 
TITLE OF INVENTION: And Stone -Washing of Dyed Denim 
NUMBER OF SEQUENCES: 3 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 62618280 No. 6261828disk of No. 6261828th America, Inc. 
STREET: 405 Lexington Avenue 
CITY: New York 
; STATE : NY 

; COUNTRY : USA 

ZIP: 10174 
; COMPUTER READABLE FORM: 
MEDIUM TYPE: Diskette 
COMPUTER: IBM Compatible 
OPERATING SYSTEM: DOS 
; SOFTWARE: FastSEQ for Windows Version 2.0 

CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/069 , 632 
FILING DATE: 2 9-APR-1998 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/DK96/00469 
; FILING DATE: 15 -NOV- 1996 

APPLICATION NUMBER: 1278/95 
FILING DATE: 15 -NOV- 1995 
ATTORNEY/AGENT INFORMATION: 
; NAME: Gregg, Valeta 

REGISTRATION NUMBER: 35,127 
; REFERENCE/DOCKET NUMBER: 4588.204-US 



TELECOMMUNICATION INFORMATION : 
TELEPHONE : 212-867-0123 
TELEFAX: 212-878-9655 
INFORMATION FOR SEQ ID NO : 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 415 amino acids 
TYPE: amino acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-09-069-632-1 

Query Match 27.0%; Score 739.5; DB 2; Length 415; 

Best Local Similarity 37.6%; Pred. No. 6.7e-49; 

Matches 170; Conservative 58; Mismatches 155; Indels 69; Gaps 16; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATN--SSTNCYD-GNTWSSTLCPDNE 65 

I II II :|: I I I :h|: I MM HI! I 

Db 8 EVHPQLTTFRCTKRGGCKPATNFIVLDSLSHPIHRAEGLGPGGCGDWGNPPPKDVCPDVE 67 

Qy 66 XCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQEFTL 122 

HUM ::| I llllhl II : : : |:||: | | h I 

Db 68 SCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL--DKTKRRYEMLHL 124 

Qy 123 LGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRD 179 

I Ihllll ::|lli:| III I I ill I II llllllhit I 
Db 125 TGFEFTFDVDATKLPCGMNSALYLSEMHPTGAKSKY--NPGGAYYGTGYCDAQCFVTP-- 180 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

III! hi I lllhlllllllll : : III I :|| 

Db 181 --FINGLGNIE GKGSCCNEMDIWEANSRASHVAPHTCNKKGLYLCE 224 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

hi : I II :|| II Ih I :|| I I ::| I llllll : 

Db 225 GEECA FEGVCDKNGCGWNNYRVNVTDYYGRGEEFKVNTLKPFTWTQFLANRR 277 

Qy 298 GAINRYYVQNGVTFQQ--PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

hhllhl : I I h I ::h:| I I : : I 

Db 27 8 GKLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGATQGM 332 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

•■\ ■■ MM Ihl I II III I |: I |: : | 

Db 333 GEALTRGMVLT^SIWWDQGGNMEWLDH GEAGPCAKGEGAPSNIVQVEP 380 

Qy 413 NAKVTFSNIKFGPIGST GNPSGGNPP 438 

Hh:h::| MM I h I 

Db 381 FPEVTYTNLRWGEIGSTYQEVQKPKPKPGHGP 412 



RESULT 43 
US-08-709-974A-5 

Sequence 5, Application US/08709974A 
Patent No. 6117664 
GENERAL INFORMATION: 

APPLICANT: Sch lein, Martin 
APPLICANT: Rosholm, Peter 
APPLICANT: Nielsen, Jack Bech 
APPLICANT: Hansen, Svend Aage 
APPLICANT: von der Osten,Claus 

TITLE OF INVENTION: No. 6117664el Alkaline Cellulases 
NUMBER OF SEQUENCES: 11 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 61176640 No. 6117664disk of No. 6117664th America, Inc. 
STREET: 405 Lexington Avenue, 64th Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 



ZIP: 10174-6401 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 
SOFTWARE: Patent In Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/709 , 974A 
FILING DATE: 09-SEP-1996 
CLASSIFICATION: 435 
ATTORNEY/ AGENT INFORMATION: 
NAME: Gregg, Valeta 
REGISTRATION NUMBER: 35,127y 
REFERENCE/DOCKET NUMBER: 4160.414-US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-012 3 
TELEFAX: 212-878-9655 
INFORMATION FOR SEQ ID NO: 5: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 402 amino acids 
TYPE: amino acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-709-974A-5 

Query Match 26.9%; Score 737.5; DB 2; Length 402; 

Best Local Similarity 38.2%; Pred. No. 9.2e-49; 

Matches 167; Conservative 57; Mismatches 150; Indels 63; Gaps 15; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATN--SSTNCYD-GNTWSSTLCPDNE 65 

I II II :h I I I H:h I I I II :||| I 

Db 8 EVHPQLTTFRCTKRGGCKPATNFIVLDSLSHPIHRAEGLGPGGCGDWGNPPPKDVCPDVE 67 

Qy 66 XCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQEFTL 122 

HUM :H I llllhl II : : : hlh II h I 

Db 68 SCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL--DKTKRRYEMLHL 124 

Qy 123 LGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRD 179 

I Ihllll ::|ll|:| Ml I I Ml I II IMMIhll I 
Db 125 TGFEFTFDVDATKLPCGMNSALYLSEMHPTGAKSKY--NPGGAYYGTGYCDAQCFVTP-- 180 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

MM hi I MlhlMMMM : : II I I Ml 

Db 181 --FINGLGNIE GKGSCCNEMDIWEANSRASHVAPHTCNKKGLYLCE 224 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

I M II Ml IMh I MM I :M I III II I : 

Db 225 GEECA FEGVCDKNGCGWNNYRVNVTDYYGRGEEFKVNTLKPFTVVTQFLANRR 277 

Qy 298 GAINRYYVQNGVTFQQ--PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

IMMMM : M h I :M:M I I : : I 

Db 278 GKLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGATQGM 332 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

M : MM MM I M III I h I h : I 

Db 333 GEALTRGMVLAMSIWWDQGGNMEWLDH GEAGPCAKGEGAPSNIVQVEP 380 

Qy 413 NAKVTFSNIKFGPIGST 429 

MhM::M MM 
Db 381 FPEVTYTNLRWGEIGST 397 



RESULT 44 

US-08-361-920-27 

; Sequence 27, Application US/08361920 
; Patent No. 5457046 



GENERAL INFORMATION: 

APPLICANT: Woeldike, Helle F. 

APPLICANT: Hagen, Frederick 
APPLICANT: Hjort, Cars ten M. 
APPLICANT: Sven, Hastrup 

TITLE OF INVENTION: An Enzyme Capable of Degrading Cellulose 
TITLE OF INVENTION: or Hemicellulose 
NUMBER OF SEQUENCES: 85 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 5457046O No. 5457046disk of No. 5457046th America, Inc. 
STREET: 405 Lexington Avenue, 62nd Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 
ZIP: 10174-6201 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 
SOFTWARE: Patentin Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/361 , 920 
FILING DATE: 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 07/940,860 
FILING DATE: 2 8 -OCT- 1992 
APPLICATION NUMBER: DK 1158/90 
FILING DATE: 09-MAY-1990 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/DK91/00124 
FILING DATE: 08 -MAY- 1991 
ATTORNEY/AGENT INFORMATION: 
NAME: Lambiris, Elias J. 
REGISTRATION NUMBER: 33,72 8 
REFERENCE/DOCKET NUMBER: 3435.204-US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-0123 
TELEFAX: 212-867-0298 
INFORMATION FOR SEQ ID NO: 27: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 435 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-361-920-27 

Query Match 26.9%; Score 737.5; DB 1; Length 435; 

Best Local Similarity 37.4%; Pred. No. le-48; 

Matches 169; Conservative 59; Mismatches 155; Indels 69; Gaps 16; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATN--SSTNCYD-GNTWSSTLCPDNE 65 

I II II :|: I I I :|:h I I I II :||| I 

Db 28 EVHPQLTTFRCTKRGGCKPATNFIVLDSLSHPIHRAEGLGPGGCGDWGNPPPKDVCPDVE 87 

Qy 66 XCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQEFTL 122 

Hill! ::| I llllhl II : : : hlh I I h I 

Db 88 SCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL--DKTKRRYEMLHL 144 

Qy 123 LGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRD 179 

I Ihllll ::||lhl III I I III h II llllllhll I 
Db 145 TGFEFTFDVDATKLPCGMNSALYLSEMHPTGAKSKY--NSGGAYYGTGYCDAQCFVTP-- 200 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

MM hi I lllhllllll II : : II I I Ml 

Db 201 --FINGLGNIE GKGSCCNEMDIWEVNSRASHWPHTCNKKGLYLCE 244 



Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

hi UN :|| II l|: I :||l I ::| I Ijjljj : 

Db 245 GEECA FEGVCDKNGCGWNNYRVim'DYYGRGEEFKVNTLKPFTVVTQFIjANRR 297 

Qy 298 GAINRYYVQNGVTFQQ--PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

hhllhl : II |: I =:|::| I I : : I 

Db 298 GKLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGATQGM 3 52 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

:| WW Ihl I II III I h I h : I 

Db 353 GEALTRGMVLAMSIWWDQGGNMEWLDH GEAGPC7UCGEGAPSNIVQVEP 400 

Qy 413 NAKVTFSNIKFGPIGST GNPSGGNPP 438 

:||::|:::| MM I h t 

Db 401 FPEVTYTNLRWGEIGSTYQEVQKPKPKPGHGP 432 



RESULT 45 
US-08-479-939-27 

Sequence 27, Application US/08479939 
Patent No. 5686593 
GENERAL INFORMATION: 

APPLICANT: Woeldike, Helle F. 
APPLICANT: Hagen, Frederick 
APPLICANT: Hjort, Cars ten M. 
APPLICANT: Sven, Hastrup 

TITLE OF INVENTION: An Enzyme Capable of Degrading Cellulose 
TITLE OF INVENTION: or Hemicellulose 
NUMBER OF SEQUENCES: 85 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 56865930 No. 5686593disk of No. 5686593th America, Inc. 
STREET: 405 Lexington Avenue, 62nd Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 
ZIP: 10174-6201 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 
SOFTWARE: Patentin Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/479 , 939 
FILING DATE: 07-JUN-1995 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/08/361, 920 
FILING DATE: 22-DEC-1994 
APPLICATION NUMBER: US 07/940,860 
FILING DATE: 2 8 -OCT- 1992 
APPLICATION NUMBER: DK 1158/90 
FILING DATE: 09-MAY-1990 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/DK91/00124 
FILING DATE: 08 -MAY- 19 91 
ATTORNEY/AGENT INFORMATION: 
NAME: Lambiris, Elias J. 
REGISTRATION NUMBER: 33,728 
REFERENCE/DOCKET NUMBER: 3435.204-US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE : 212-867-0123 
TELEFAX: 212-867-02 98 
INFORMATION FOR SEQ ID NO: 27: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 435 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 



MOLECULE TYPE: protein 
US-08-479-939-27 



Query Match 26.9%; Score 737.5; DB 1; Length 435; 

Best Local Similarity 37.4%; Pred. No. le-48; 

Matches 169; Conservative 59; Mismatches 155; Indels 69; Gaps 16; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATN--SSTNCYD-GNTWSSTLCPDNE 65 

I II II :h I I I :h|: I I I II :||| I 

Db 28 EVHPQLTTFRCTKRGGCKPATNFIVLDSLSHPIHRAEGLGPGGCGDWGNPPPKDVCPDVE 87 

Qy 66 XCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQEFTL 122 

:||||l ::| I llllhl II : : : hlh I I h I 

Db 88 SCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL--DKTKRRYEMLHL 144 

Qy 123 LGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRD 179 

I Ihllll ::|llhl III I I III h II llilllhll I 
Db 145 TGFEFTFDVDATKLPCGMNSALYLSEMHPTGAKSKY--NSGGAYYGTGYCDAQCFVTP-- 200 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

MM hi I MMMMMI M : : M I I Ml 

Db 201 --FINGLGNIE GKGSCCNEMDIWEVNSRASHWPHTCNKKGLYLCE 244 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

hi MM Ml M Ih I MM I :M I llllll : 

Db 245 GEECA FEGVCDKNGCGWNNYRVNVTDYYGRGEEFKVNTLKPFTWTQFLANRR 297 

Qy 298 GAINRYYVQNGVTFQQ--PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

IMMIIM : I I h I :M:M I I = : I 

Db 298 GKLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGATQGM 352 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

■■\ Mil Ihl I II III I h Ih : I 

Db 353 GEALTRGMVLAMS IWWDQGGNMEWLDH - -GEAGPCAKGEGAPSNI VQVEP 400 

Qy 413 NAKVTFSNIKFGPIGST GNPSGGNPP 438 

MhM::M MM I h I 

Db 401 FPEVTYTNLRWGEIGSTYQEVQKPKPKPGHGP 432 



RESULT 46 
US-08-483-432-27 

; Sequence 27, Application US/08483432 
; Patent No. 5763254 
; GENERAL INFORMATION: 

APPLICANT: Woeldike, Helle F. 
; APPLICANT: Hagen, Frederick 
APPLICANT: Hjort, Cars ten M. 
; APPLICANT: Sven, Hastrup 

TITLE OF INVENTION: An Enzyme Capable of Degrading Cellulose 
TITLE OF INVENTION: or Hemicellulose 
NUMBER OF SEQUENCES: 85 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 57632540 No. 5763254disk of No. 5763254th America, Inc. 
STREET: 405 Lexington Avenue, 62nd Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 

ZIP: 10174-6201 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
; COMPUTER: IBM PC compatible 

; OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentin Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/483,432 
; FILING DATE: 07-JUN-1995 



* CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/08/361, 920 
FILING DATE: 

APPLICATION NUMBER: US 07/940,860 
FILING DATE: 2 8 -OCT- 1992 
APPLICATION NUMBER: DK 1158/90 
FILING DATE: 09-MAY-1990 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/DK91/00124 
FILING DATE: 08 -MAY- 1991 
ATTORNEY/AGENT INFORMATION : 
NAME: Lambiris, Ellas J. 
REGISTRATION NUMBER: 33,728 
REFERENCE/DOCKET NUMBER: 3435. 204 -US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-0123 
TELEFAX: 212-867-0298 
INFORMATION FOR SEQ ID NO: 27: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 435 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-483-432-27 

Query Match 26.9%; Score 737.5; DB 1; Length 435; 

Best Local Similarity 37.4%; Pred. No. le-48; 

Matches 169; Conservative 59; Mismatches 155; Indels 69; Gaps 16; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATN--SSTNCYD-GNTWSSTLCPDNE 65 

I II II :h II I :h|: I I III :||| I 

Db 28 EVHPQLTTFRCTKRGGCKPATNFIVLDSLSHPIHRAEGLGPGGCGDWGNPPPKDVCPDVE 87 

Qy 66 XCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQEFTL 122 

HIIII ::| I llllhl II : : : hlh I I h I 

Db 88 SCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL--DKTKRRYEMLHL 144 

Qy 123 LGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRD 179 

I Ihllll ::|Mhl III I I III h II llllllhll I 
Db 145 TGFEFTFDVDATKLPCGMNSALYLSEMHPTGAKSKY--NSGGAYYGTGYCDAQCFVTP-- 200 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

nil hi I lllhllllll II : : II I I :|| 

Db 201 --FINGLGNIE GKGSCCNEMDIWEVNSRASHWPHTCNKKGLYLCE 244 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

hi : I II :|| II Ih I :|| I I ::| I llllll : 

Db 245 GEECA FEGVCDKNGCGWNNYRVNVTDYYGRGEEFKVNTLKPFTWTQFLANRR 297 

Qy 298 GAINRYYVQNGVTFQQ--PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

hhllhl : I I h I ::h:| I I : : I 

Db 298 GKLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGATQGM 352 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

:| : MM Ihl I II III I |: I |: : | 

Db 353 GEALTRGMVLAMSIWWDQGGNMEWLDH GEAGPCAKGEGAPSNIVQVEP 400 

Qy 413 NAKVTFSNIKFGPIGST GNPSGGNPP 438 

Mh:h::| MM I h I 

Db 401 FPEVTYTNLRWGEIGSTYQEVQKPKPKPGHGP 432 



RESULT 47 

US-09-069-632-3 

; Sequence 3, Application US/09069632 
; Patent No. 6261828 



; GEiJERAL INFORMATION: 

APPLICANT: Lund, Henrik 
; TITLE OF INVENTION: A Process For Combined Desizing 
; TITLE OF INVENTION: And Stone-Washing of Dyed Denim 
NUMBER OF SEQUENCES: 3 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 62618280 No, 6261828disk of No. 6261828th America, Inc. 
; STREET: 405 Lexington Avenue 

CITY: New York 
STATE : NY 
COUNTRY : USA 
ZIP: 10174 
COMPUTER READABLE FORM: 
MEDIUM TYPE: Diskette 
COMPUTER: IBM Compatible 
OPERATING SYSTEM: DOS 
; SOFTWARE: FastSEQ for Windows Version 2.0 

CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 09/0 69 , 632 
FILING DATE: 29-APR-1998 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/DK96/00469 
FILING DATE: 15-NOV-1996 
APPLICATION NUMBER: 1278/95 
FILING DATE: 15-NOV-1995 
ATTORNEY/ AGENT INFORMATION: 
NAME: Gregg, Valeta 
REGISTRATION NUMBER: 35,127 
; REFERENCE/DOCKET NUMBER: 4 588. 204 -US 

TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-0123 
TELEFAX: 212-878-9655 
INFORMATION FOR SEQ ID NO : 3: 
; SEQUENCE CHARACTERISTICS: 

; LENGTH: 435 amino acids 

; TYPE: amino acid 

STRANDEDNESS : single 
; TOPOLOGY: linear 

; MOLECULE TYPE: protein 
US-09-069-632-3 

Query Match 26.9%; Score 737.5; DB 2; Length 435; 

Best Local Similarity 37.4%; Pred. No. le-48; 

Matches 169; Conservative 59; Mismatches 155; Indels 69; Gaps 16; 
Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATN--SSTNCYD-GNTWSSTLCPDNE 65 



Db 




Qy 



66 XCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQEFTL 122 



Db 



88 SCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL- -DKTKRRYEMLHL 




Qy 



123 LGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRD 179 



Db 




180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 



Qy 



Db 



2 01 - -FINGLGNIE GKGSCCNEMDIWEVNSRASHWPHTCNKKGLYLCE 




Qy 



240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 



Db 



245 GEECA FEGVCDKNGCGWNNYRVNVTDYYGRGEEFKVNTLKPFTVVTQFLANRR 2 




Qy 



298 - 



- -GAINRYYVQNGVTFQQ- -PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 



hhllhl : I I h I ::|::| I I = : I 

Db 2 98 GKLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGATQGM 352 

{ 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

= 1 : MM MM I M Ml I |: Ih : I 

Db 353 GEALTRGMVLAMSIWWDQGGNMEWLDH GEAGPCAKGEGAPSNIVQVEP 400 

Qy 413 NAKVTFSNIKFGPIGST GNPSGGNPP 438 

:|h:|:::| MM 1 h I 

Db 401 FPEVTYTNLRWGEIGSTYQEVQKPKPKPGHGP 432 



RESULT 48 
US-08-709-974A-1 

Sequence 1, Application US/08709974A 
Patent No. 6117664 
GENERAL INFORMATION: 

APPLICANT: Sch lein, Martin 
APPLICANT: Rosholm, Peter 
APPLICANT: Nielsen, Jack Bech 
APPLICANT: Hansen, Svend Aage 
APPLICANT: von der Osten,Claus 

TITLE OF INVENTION: No. 6117664el Alkaline Cellulases 
NUMBER OF SEQUENCES: 11 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 61176640 No. 6117664disk of No. 6117664th America, Inc. 
STREET: 405 Lexington Avenue, 64th Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 
ZIP: 10174-6401 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC -DOS/MS -DOS 
SOFTWARE: Patentin Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/709 , 974A 
FILING DATE: 09-SEP-1996 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Gregg, Valeta 
REGISTRATION NUMBER: 35,127y 
REFERENCE/DOCKET NUMBER: 4160. 414 -US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE : 212-867-0123 
TELEFAX: 212-878-9655 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 402 amino acids 
TYPE: amino acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-709-974A-1 

Query Match 26.8%; Score 735.5; DB 2; Length 402; 

Best Local Similarity 38.0%; Pred. No. 1.3e-48; 

Matches 166; Conservative 58; Mismatches 150; Indels 63; Gaps 15; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATN--SSTNCYD-GNTWSSTLCPDNE 65 

I il II M: I I I MM: I I I II MM I 

Db 8 EVHPQLTTFRCTKRGGCKPATNFIVLDSLSHPIHRAEGLGPGGCGDWGNPPPKDVCPDVE 67 

Qy 66 XCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQEFTL 122 

MMM ::| I MMhl M : : : IMh I I h I 

Db 68 SCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL--DKTKRRYEMLHL 124 



Qy 123 LGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRD 179 

I Ihllll ::|IM:| III I I ill h jj llllllhll I 
Db 125 TGFEFTFDVDATKLPCGMNSALYLSEMHPTGAKSKY--NSGGAyyGTGYCDAQCFVTP-- 180 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

INI I '-I I IMhIIIIII 11 : : II I I :|| 

Db 181 --FINGLGNIE GKGSCCNEMDIWEVNSRASHWPHTCNKKGLYLCE 224 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

hi : I II :|l II Ih I HI I I ::| I lllll! : 

Db 225 GEECA FEGVCDKNGCGWNNYRVNWDYYGRGEEFKVNTLKPFTVVTQFLANRR 277 

Qy 298 GAINRYYVQNGVTFQQ--PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

hhllhl : I I h I ::|::| I I : : I 

Db 278 GKLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGATQGM 332 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

= 1 : ill! Ihl I II III I |: I |: : | 

Db 333 GEALTRGMVLAMSIWWDQGGNMEWLDH GEAGPCAKGEGAPSNIVQVEP 380 

Qy 413 NAKVTFSNIKFGPIGST 429 

:|h:h-l MM 
Db 381 FPEVTYTNLRWGEIGST 397 



RESULT 4 9 
US-08-709-974A-4 

Sequence 4, Application US/08709974A 
Patent No. 6117664 
GENERAL INFORMATION: 

APPLICANT: Sch lein, Martin 
APPLICANT: Rosholm, Peter 
APPLICANT: Nielsen, Jack Bech 
APPLICANT: Hansen, Svend Aage 
APPLICANT: von der Osten,Claus 

TITLE OF INVENTION: No. 6117664el Alkaline Cellulases 
NUMBER OF SEQUENCES: 11 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 61176640 No. 6117664disk of No. 6117664th America, Inc. 
STREET: 405 Lexington Avenue, 64th Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 
ZIP: 10174-6401 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 
SOFTWARE: Patentin Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/709 , 974A 
FILING DATE: 09-SEP-1996 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Gregg, Valeta 
REGISTRATION NUMBER: 35,127y 
REFERENCE/DOCKET NUMBER: 4 160. 4 14 -US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-0123 
TELEFAX: 212-878-9655 
INFORMATION FOR SEQ ID NO : 4: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 415 amino acids 
TYPE: amino acid 
STRANDEDNESS : single 
TOPOLOGY: linear 



MOLECULE TYPE: protein 
US-08-709-974A-4 



Query Match 26.8%; Score 735.5; DB 2; Length 415; 

Best Local Similarity 37.4%; Pred. No. 1.4e-48; 

Matches 169; Conservative 58; Mismatches 156; Indels 69; Gaps 16; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATN--SSTNCYD-GNTWSSTLCPDNE 65 

I II II :h I I I :hh I I I II :||l I 

Db 8 EVHPQLTTFRCTKRGGCKPATNFIVLDSLSHPIHR7VEGLGPGGCGDWGNPPPKDVCPDVE 67 

Qy 66 XCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQEFTL 122 

HIIII ::| I llllhl II : : : hlh II h I 

Db 68 SCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL--DKTKRRYEMLHL 124 

Qy 123 LGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRD 179 

I Ihllll ::||||:| III I I III I II llllllhll I 
Db 125 TGFEFTFDVDATKLPCGMNSALYLSEMHPTGAKSKY--NPGGAYYGTGYCDAQCFVTP-- 180 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

MM IM I IMhIIMM M : : II I I Ml 

Db 181 --FINGLGNIE GKGSCCNEMDIWEVNSRASHWPHTCNKKGLYLCE 224 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

hi : III Ml II Ih I MM I :M I Mini : 

Db 225 GEECA FEGVCDKNGCGWNNYRVNVTDYYGRGEEFKVNTLKPFTVVTQFLANRR 277 

Qy 298 GAINRYYVQNGVTFQQ~-PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

IMMIIM : M h I -hM I I : : I 

Db 278 GKLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGATQGM 332 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

M : MM MM I II III I |: I |: : | 

Db 333 GEALTRGMVLAMSIWWDQGGNMEWLDH GEAGPCAKGEGAPSNIVQVEP 380 

Qy 413 NAKVTFSNIKFGPIGST GNPSGGNPP 438 

MhM-M MM I |: I 

Db 381 FPEVTYTNLRWGEIGSTYQEVQKPKPKPGHGP 412 



RESULT 50 
US-08-709-979A-3 

; Sequence 3, Application US/08709979A 

; Patent No. 5912157 
; GENERAL INFORMATION: 
; APPLICANT: Claus von der Osten 
APPLICANT: Martin Sch lein 

TITLE OF INVENTION: No. 5912157el Alkaline Cellulases 
NUMBER OF SEQUENCES : 7 
; CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 59121570 No. 5912157disk of No. 5912157th America, Inc. 
STREET: 405 Lexington Avenue, 64th Floor 
CITY: New York 
; STATE: New York 

COUNTRY: United States of America 
ZIP: 10174-6401 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS 
SOFTWARE: Patentin Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/709 , 979A 
; FILING DATE: 09-SEP-1996 

CLASSIFICATION: 435 
ATTORNEY/ AGENT INFORMATION: 
; NAME: Lambiris, Elias J. 



REGISTRATION NUMBER: 33,728 
REFERENCE/DOCKET NUMBER: 4 160. 4 04 -US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-0123 
TELEFAX: 212-878-9655 
INFORMATION FOR SEQ ID NO: 3: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 4 02 amino acids 
TYPE: amino acid 
STRANDEDNESS : s ingle 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-709-979A-3 

Query Match 26.5%; Score 725.5; DB 1; Length 402; 

Best Local Similarity 37.7%; Pred. No. 7.7e-48; 

Matches 166; Conservative 59; Mismatches 146; Indels 69; Gaps 16; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSST NCYD-GNTWSSTLCP 62 

I II II H: i I I H : :| : : III! :|| 

Db 8 EVHPQLTTFRCTKRGGCKPATNFIV DLSLSHPIHRAEGLGPGGCGDWGNPPPKDVCP 64 

Qy 63 DNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQE 119 

I hlllll ::| I llllhl II : : : hlh I I h 

Db 65 DVESCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL--DKTKRRYEM 121 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC 176 

I I Ihilll ::|llhl III I I III I II llllllhll 
Db 122 LHLTGFEFTFDVDATKLPCGMNSALYLSEMHPTGAKSKY--NPGGAYYGTGYCDAQCFVT 179 

Qy 177 PRDLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQE 236 

I MM hi I MMMMMMM : : M I I 

Db 180 P FINGLGNIE GKGSCCNEMDIWEANSRASHVAPHTCNKKGLY 221 

Qy 237 ICEGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFET 296 

MM: I : I II :|| || ||: j :|| | | ::| j |||||| 

Db 222 LCEGEECA FEGVCDKNGCGWNNYRVNVTDYYGRGEEFKVNTLKPFTWTQFLA 274 

Qy 297 S GAINRYYVQNGVTFQQ--PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGL 349 

IMMMM : I I h I ::|:M I I : : I 

Db 275 NRRGKLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGAT 329 

Qy 350 TQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVES 409 

M : MM MM I M Ml I h I h : 

Db 330 QGMGEALTRGMVLAMSIWWDQGGNMEWLDH GEAGPCAKGEGAPSNIVQ 377 

Qy 410 QSPNAKVTFSNIKFGPIGST 429 

I MhM::M MM 
Db 378 VEPFPEVTYTNLRWGEIGST 397 



RESULT 51 
US-08-833-642A-5 

; Sequence 5, Application US/08833642A 
; Patent No. 5883066 
; GENERAL INFORMATION: 

; APPLICANT: Ivan M. A. J, Herbots et al . 

; TITLE OF INVENTION: Liquid Detergent Compositions 

TITLE OF INVENTION: Containing Cellulase and Amine 

NUMBER OF SEQUENCES: 5 
; CORRESPONDENCE ADDRESS: 

ADDRESSEE: Jackie Ann Zurcher 
; ADDRESSEE: Dinsmore & Shohl LLP 

; STREET: 255 E. Fifth Street 

; STREET: 1900 Chemed Center 

; CITY: Cincinnati 

STATE : Ohio 



COUNTRY : USA 
ZIP: 45202 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Diskette, 3.5 inch 
COMPUTER: IBM PC Compatible 
OPERATING SYSTEM: MS-DOS 
SOFTWARE: WordPerfect 6.1 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/833 , 642A 
FILING DATE: April 8, 1997 
ATTORNEY/AGENT INFORMATION : 
NAME: Zurcher, J. A, 
REGISTRATION NUMBER: P42,251 
REFERENCE /DOCKET NUMBER: CM551C 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (513) 977-8377 
TELEFAX: (513) 977-8141 
INFORMATION FOR SEQ ID NO: 5: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 415 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-833-642A-5 

Query Match 25.7%; Score 705.5; DB 1; Length 415; 

Best Local Similarity 37.3%; Pred. No. 2.8e-46; 

Matches 163; Conservative 59; Mismatches 152; Indels 63; Gaps 15; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATN--SSTNCYD-GNTWSSTLCPDNE 65 

I II il :|: I I I :|:|: I I I II :||| | 

Db 8 EVHPQLTTFRCTKRGGCKPATNFIVLDSLSHPIHRAEGLGPGGCGDHGNPPPKDVCPDVE 67 

Qy 66 XCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTT YQEFTL 122 

Ullll ::| I llllhl II : : : hjj: II h I 

Db 68 SCAKNCIMEGIPDYSQYGVTTNGTSLRLQHILPDG-RVPSPRVYLL--DKTKRRYEMLHL 124 

Qy 123 LGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRD 179 

I Ihllll ::||||:| III I IN h II llllllhll I 

Db 125 TGFEFTFDVDATKLPCGMNSALYLFENHPTGAKSKY--NSGGAYYGTGYCDAQCFVTP-- 180 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

MM hi I lllhllllll II : : II I I :|| 

Db 181 --FINGLGNIE GKGSCCNEMDIWEVNSRASHWPHTCNKKGLYLCE 224 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

\-- \ =111 :|| :| Ih I :|| I I ::| I llllll : 

Db 225 GEECA FEGVCDKNGCGYNNYRVNVTDYYGRGEEFKVNTLKPFTWTQFLANRR 277 

Qy 298 GAINRYYVQNGVTFQQ--PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQF 352 

hhllhl : II |: I ::|::|| | : : | 

Db 278 GRLEKIHRFYVQDGKVIESFYTNKEGVPYT-NMIDDEFCEAT GSRKYMELGATQGM 332 

Qy 353 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 412 

M : MM Ihl I M II Ih I h : I 

Db 333 GEALTRGMVLAMSIWWDQGGNMENLDH GEAGPCAKGEGAPSNIVQVEP 380 

Qy 413 NAKVTFSNIKFGPIGST 429 

Mh:h::| MM 
Db 381 FPEVTYTNLRWGEIGST 397 



RESULT 4 
S38794 

cellulose 1, 4-beta-cellobiosidase (EC 3.2.1.91) - imperfect fungus (Humicola grisea) 
N; Alternate names: beta-glucancellobiohydrolase; exoglucanase 
C; Species: Humicola grisea var. thermoidea 



C;Date: lO-Sep-1999 #sequence_revision lO-Sep-1999 #text_change 09-Jul-2004 
C;Accession: S38794; S08240; A45869 
R; Radford; A. 

submitted to the EMBL Data Library, June 1991 
A; Reference number: S38794 
A; Accession: S3 8794 
A; Molecule type: DNA 
A/Residues : 1-525 <RAD> 

A; Cross-references: UNIPROT:P15828; UNIPARC :UPI000012BE0F; EMBL:X17258; NID:g2760; 
PIDN:CAA35159.1; PID:g2761 

A;Note: this is a revision to the sequence from reference S08240 
R;de Oliviera Azevedo, M. ; Radford, A. 
Nucleic Acids Res. 18, 668, 1990 

A; Title: Sequence of cbh-1 gene of Humicola grisea var. thermoidea. 

A/Reference number: S08240; MUID : 90175006 ; PMID:2308855 

A /Access ion: S08240 

A;Molecule type: DNA 

A/Residues: 1-299, 'H' , 301-525 <DEO> 

A/Cross-references: UNIPARC :UPI00001729F6 ; EMBL:X17258 
A;Note: the authors translated the codon CAG for residue 87 as His 
A;Note: this sequence has been revised in reference S38794 
R;Azevedo, M.; de, 0.; Felipe, M.S.S.; Astolf i-Filho, S.; Radford, A. 
J. Gen. Microbiol. 136, 2569-2576, 1990 

A; Title: Cloning, sequencing and homologies of the cbh-1 (exoglucanase) gene of Humicola grisea 
var . thermoidea . 

A; Reference number: A45869; MUID : 91178527 ; PMID:2127803 
A; Accession: A45869 

A; Status: not compared with conceptual translation 
A; Molecule type: DNA 

A; Residues: 1-20, 'R' ,22-34, 'K' , 36-86, 'H' , 88-141, 'V , 143-157, 'Y' , 159-237, 'QQH' ,241-244, '!» ,246- 
299, 'H' ,301-525 <A2E> 

A; Cross -references: UNIP7VRC:UPI00001729F7; GB:M64588; GB:X17258 
A/Note : this sequence has been revised. See entry S08240 
C; Genetics : 
A; Gene: cbh-1 
A;Introns: 138/1 

C; Super family: cellulose 1, 4-beta-cellobiosidase I; fungal cellulose-binding domain homology 
C;Keywords: glycosidase; hydrolase; polysaccharide degradation 
F;494-525/Domain: fungal cellulose -binding domain homology <FCB> 

Query Match 60.3%; Score 1652; DB 1; Length 525; 

Best Local Similarity 57.3%; Pred. No. 1.5e-91; 

Matches 294; Conservative 76; Mismatches 121; Indels 22; Gaps 7; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl :| II hi IhHI I h :hllllll : Mill II I ::: 

Db 19 QQACSLTTERHPSLSWNKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQK-NVGARLYLMASDTTYQE 119 

I I ::|hll|:||| I lllhlhhilh llh llhl III : II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQHSTNVGSRTYLMDGEDKYQT 138 

Qy 12 0 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhllllll : llllllllllllllllhhil I llllllllllhlllll 

Db 139 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

Hlllhlhlll hh I I I :|:||IIIIMIIh:: I llllll Ul II 
Db 199 IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 297 

II lllllh II I illlll|:| II II :||| I hllllhllllll 

Db 259 GDSCGGTYSNERYAGVCDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITVVTQFLKDAN 316 

Qy 298 GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

II hi Ihl : : II : hi II h llh I 

Db 317 GDLGEIKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 



Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II :| lllllhllh :|IMIIIhl : : III Ihl |:|||||:|h::|| 

Db 377 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

Qy 414 AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 

I lllhlllllll I :||lll II III: Mill :| II 
Db 436 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 492 

Qy 465 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

: iiiiihHil I I II HUM 

Db 4 93 RWQQCGGIGFTGPTQCEEPYICTKLNDWYSQCL 525 

RESULT 6 
S42093 

cellulose 1, 4-beta-cellobiosidase (EC 3.2.1.91) - Neurospora crassa 
C; Species: Neurospora crassa 

C;Date: 20-May-1994 #sequence_revision lO-Nov-1995 #text_change 09-Jul-2004 
C; Accession: S42093 
R;Taleb, F.; Radford, A. 

submitted to the EMBL Data Library, February 1994 

A; Description: Cloning sequencing and homologies of the CBH-1 (exocellobiohydrolase) gene of 

Neurospora crassa. 

A/Reference number: S42093 

A; Accession: S42 093 

A; Molecule type: DNA 

A; Residues: 1-516 <TAL> 

A; Cross-references: UNIPROT:P38676; UNIPARC :UPI000011D714 ; EMBL:X77778; NID:g456657; 
PIDN:CAA54 815.1; PID:g456658 
C; Genetics : 
A;Introns: 227/3 

C;Superfamily : cellulose 1 , 4 -beta-cellobiosidase I; fungal cellulose-binding domain homology 
C;Keywords: glycosidase; hydrolase; polysaccharide degradation 
F;485-516/Domain: fungal cellulose-binding domain homology <FCB> 

Query Match 57.0%; Score 1561; DB 2; Length 516; 

Best Local Similarity 57.5%; Pred. No. 4.1e-86; 

Matches 294; Conservative 62; Mismatches 129; Indels 26; Gaps 10; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

Mil:: II mill: || | j : : I : | | || | | | || | : || || II I :|| 
Db 18 QQAGTLTAKRHPSLTWQKCTRGGCPTLNT-TMVLDANWRWTHATSGSTKCYTGNKWQATL 76 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEF 120 

III :=ll II INI I llhl II ||:: III Mill III! II II 

Db 77 CPDGKSCAANCALDGADYTGTYGITGSGWSLTLQFVTD NVGARAYLMADDTQYQML 132 

Qy 121 TLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDL 180 

II I III hi :|| I II III I :|llll|: Mill Mill IIIIMIIIII 
Db 133 ELLNQELWFDVDMSNIPCGLNGALYLSAMDADGGMRKYPTNKAGAKYATGYCDAQCPRDL 192 

Qy 181 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEG 240 

hill mill Ihhll III llllllllllllll :| I llllill: I :||| 
Db 193 KYINGIANVEGWTPSTNDAN-GIGDHGSCCSEMDIWEANKVSTAFTPHPCTTIEQHMCEG 251 

Qy 241 DGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA- 299 

I lillllhlll II lllhl Ihllhlll I |:||: I MUM I 
Db 252 DSCGGTYSDDRYGVLCDADGCDFNSYRMGNTTFYGEGK--TVDTSSKFTVVTQFIKDSAG 309 

Qy 300 INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKK 354 

I :|MII : : : III : :| ::: II h jjli I I 

Db 310 DLAEIKAFYVQNGKVIENSQSNVDGVSGNSITQSFCKSQKTAFGDIDDFNKKGGLKQMGK 369 

Qy 355 ATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNA 414 

I •■ lllllhllh lllllllilll ill III hllilhh: :|h 

Db 370 ALAQAMVLVMSIWDDHAANMLWLDSTYP VPKVPGAYRGSGPTTSGVPAEVDANAPNS 426 



Qy 415 KVTFSNIKFGPI GSTGNPSGGNPPGGNPPGTTTTRRPATTTGSSP-GPTQSHY 466 

II lllllll : Ihl I II I ::| : :|:| hi I :|: 

Db 427 KVAFSNIKFGHLGISPFSGGSSGTPP-SNPSSSASPTSSTAKPSSTSTASNPSGTGAAHW 485 

Qy 467 GQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

lllllhllll I II : lllh 

Db 486 AQCGGIGFSGPTTCPEPYTCAKDHDIYSQCV 516 

RESULT 13 
JE0313 

exoglucanase (EC 3.2.-.-) - imperfect fungus (Humicola grisea) 
C; Species: Humicola grisea 

C;Date: 05-Feb-1999 #sequence_revision 05-Feb-1999 #text_change 09-Jul-2004 
C; Accession: JE0313 

R;Takashima, S.; likura, H.; Nakamura, A.; Hidaka, M.; Masaki, H. ; Uozumi, T. 
J. Biochem. 124, 717-725, 1998 

A; Title: Isolation of the gene and characterization of the enzymatic properties of a major 

exoglucanase of Humicola grisea without a cellulose-binding domain. 

A/Reference number: JE0313; MUID : 98429588 ; PMID:9756616 

A; Accession: JE0313 

A; Status : preliminary 

A; Molecule type: DNA 

A/Residues : 1-451 <TAK> 

A; Cross-references : UNI PROT : 093780 ; UNIPARC :UPI000005E865 ; DDBJ : AB003105 

C;Superf amily: cellulose 1 , 4 -beta-cellobiosidase I; fungal cellulose-binding domain homology 
C;Keywords: glycosidase; hydrolase 

Query Match 45.4%; Score 1243.5; DB 2; Length 451; 

Best Local Similarity 52.0%; Pred. No. 3.3e-67; 

Matches 226; Conservative 85; Mismatches 113; Indels 11; Gaps 9; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

M h :| II :||::|| I I I lllllllll I h llhll hi 

Db 23 QQAGTITAENHPRMTWKRCSGPGNCQTVQGEWIDANWRWLH--NNGQNCYEGNKWTSQ- 79 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQ-KNVGARLYLMASDTTYQE 119 

I :|h I Mil I MM :|||:||:: Mh hhl lllh II 

Db 80 CSSATDCAQRCALDGANYQSTYGASTSGDSLTLKFVTKHEYGTNIGSRFYLMANQNKYQM 139 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

Mh llhlllhh: Ihl llllhh Mh: ||:| MMMMIIhll II 
Db 140 FTLMNNEFAFDVDLSKVECGINSALYFVAMEEDGGMASYPSNRAGAKYGTGYCDAQCARD 199 

Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQ-EIC 238 

MM hlhIM Ihh I hi hlhhhihh : I III I : : II 
Db 200 LKFIGGKANIEGWRPSTNDPNAGVGPMGACCAEIDVWESNAYAYAFTPHACGSKNRYHIC 259 

Qy 239 EGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG 298 

I : llllllhh I II Mlhlllhll Ml I hll :| llhMI : 
Db 260 ETNNCGGTYSDDRFAGYCDANGCDYNPYRMGNKDFYGKGK--TVDTNRKFTWSRFERN- 316 

Qy 299 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKKAT 356 

:-:Mhl : I j : :: : | h I : I- II M 

Db 317 RLSQFFVQDGRKIEVPPPTWPGLPNSADITPELCDAQFRVFDDRNRFAETGGFDALNEAL 376 

Qy 357 SGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKV 416 

: lllllhllh::|||||lhll I : II II I hllllhlhl hhl 
Db 377 TIPMVLVMSIWDDHHSNMLWLDSSYPP-EKAGLPGGDRGPCPTTSGVPAEVEAQYPDAQV 435 

Qy 417 TFSNIKFGPIGSTGN 431 

Mlhlllllll I 
Db 436 VWSNIRFGPIGSTVN 450 



RESULT 13 



Q12621 HUMGT 



ID Q12621_HUMGT PRELIMINARY; PRT; 525 AA. 

AC Q12621; 

DT Ol-NOV-1996, integrated into UniProtKB/TrEMBL . 

DT Ol-NOV-1996, sequence version 1. 

DT 07-FEB-2006, entry version 31. 

DE Cellulase (EC 3 . 2 . 1 . 91) . 

GN Name=cbh-1; 

OS Humicola grisea var. thermoidea. 

OC Eukaryota; Fungi; Ascomycota; mitosporic Ascomycota; Humicola. 

OX NCBI_TaxID=5528 ; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=IF09854; 

RA Takashima S., Nakamura A., Hidaka M., Masaki H., Uozumi T.; 

RT "Cloning, sequencing, and expression of the cellulase genes of 

RT Humicola grisea var. thermoidea."; 

RL Submitted (JUL-1995) to the EMBL/GenBank/DDBJ databases. 

CC -!- FUNCTION: The biological conversion of cellulose to glucose 

CC generally requires three types of hydrolytic enzymes: (1) 

CC Endoglucanases which cut internal beta-1, 4-glucosidic bonds; (2) 

CC Exocellobiohydrolases that cut the dissaccharide cellobiose from 

CC the nonreducing end of the cellulose polymer chain; (3) Beta-1, 4- 

CC glucosidases which hydrolyze the cellobiose and other short cello- 

CC oligosaccharides to glucose (By similarity) . 

CC -!- CATALYTIC ACTIVITY: Hydrolysis of 1, 4-beta-D-glucosidic linkages 
CC in cellulose and cellotetraose, releasing cellobiose from the non- 

CC reducing ends of the chains. 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; D63515; BAA09785.1; -; Genomic_DNA. 

DR HSSP; Q09431; IGPI . 

DR GO; GO: 0005576; C : extracellular region; lEA. 

DR GO; GO:0016162; F:cellulose 1, 4 -beta-cellobiosidase activity; lEA. 

DR GO; GO: 0030248; F: cellulose binding; lEA. 

DR GO; GO: 0005975; P : carbohydrate metabolism; lEA. 

DR GO; GO: 0030245; P: cellulose catabolism; lEA. 

DR GO; GO: 0000272; P : polysaccharide catabolism; lEA. 

DR InterPro; IPR000254; CBD_fun. 

DR InterPro; IPR001722; Glyco_hydro_7 . 

DR Pfam; PF00734; CBM_1; 1. 

DR Pfam; PF00840; Glyco_hydro_7 ; 1. 

DR PRINTS; PR00734; GLHYDRLASE7 . 

DR ProDom; PDO 01821; CBD_fungal; 1. 

DR ProDom; PD186135; Glyco_hydro_7 ; 1. 

DR SMART; SM00236; fCBD; 1. 

DR PROSITE; PS00562; CBD_FUNGAL; 1. 

KW Carbohydrate metabolism; Cellulose degradation; Glycosidase; 

KW Hydrolase; Polysaccharide degradation. 

SQ SEQUENCE 525 AA; 55722 MW; A2E6E5F40F6D3BB0 CRC64; 



Query Match 60.5%; Score 1658; DB 2; Length 525; 

Best Local Similarity 57.5%; Pred. No. 5.9e-101; 

Matches 295; Conservative 77; Mismatches 119; Indels 22; Gaps 7; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl :| II |:hl|::|| I h HHIIIII : Mill 11 I 

Db 19 QQACSLTTERHPSLSWKKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 

Qy 61 CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 119 

I I ::||:ilhlll I I I I I : I I : I : I I I : llh Hhl III : II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQYSTNVGSRTYLMDGEDKYQT 138 

Qy 120 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I lllllhllllll : llllllllllllllllhhil I llllllllllhlllll 



1 

Db 139 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 



Qy 180 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

Hlllhlhlll hh I I I :hllllllllllh:: I illlll HI II 
Db 199 IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 240 GDGCGGTYSDNRYGGTCDPDGCDVmPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETS-- 297 

II lllllh II I IMIIIhl II II :||| I hllllhllllll 
Db 259 GDSCGGTYSNERYAGVCDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITVVTQFLKDAN 316 

Qy 298 ---GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

I I hllhl : : II : hi \\ h llh I 

Db 317 GDLGEIKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 

Qy 354 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

M U lllllhllh :|||||llhl : : III Ihl hllllhlh::|| 
Db 377 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

Qy 414 AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 

: I lllhlllllll I :||||| II llh Mill :| II 
Db 436 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 492 

Qy 465 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

: lllllhHIl I II II :MIII 

Db 493 RWQQCGGIGFTGPTQCEEPYTCTKLNDWYSQCL 525 



RESULT 15 

GUXl HUNGT 



ID GUX1_HUMGT STANDARD; PRT; 525 AA. 

AC P15828; 

DT Ol-APR-1990, integrated into UniProtKB/Swiss-Prot . 

DT Ol-FEB-1996, sequence version 2. 

DT 07-FEB-2006, entry version 55. 

DE Exoglucanase 1 precursor (EC 3.2.1.91) (Exoglucanase I) 

DE (Exocellobiohydrolase I) {1, 4-beta-cellobiohydrolase) (Beta- 

DE glucancellobiohydrolase) . 

GN Name=CBH-l; 

OS Humicola grisea var. thertnoidea. 

OC Eukaryota; Fungi; Ascomycota; mitosporic Ascomycota; Humicola. 

OX NCBI_TaxID=5528 ; 

RN [1] 

RP NUCLEOTIDE SEQUENCE [GENOMIC DNA] . 

RX MEDLINE=90175006; PubMed=2308855 ; 

RA de Oliviera Alzevedo M. , Radford A.; 

RT "Sequence of cbh-1 gene of Humicola grisea var. thermoidea. " ; 

RL Nucleic Acids Res. 18:668-668(1990). 

CC FUNCTION: The biological conversion of cellulose to glucose 

CC generally requires three types of hydrolytic enzymes: (1) 

CC Endoglucanases which cut internal beta-1, 4-glucosidic bonds; (2) 

CC Exocellobiohydrolases that cut the dissaccharide cellobiose from 

CC the nonreducing end of the cellulose polymer chain; (3) Beta-1, 4- 

CC glucosidases which hydrolyze the cellobiose and other short cello- 

CC oligosaccharides to glucose. 

CC -!- CATALYTIC ACTIVITY: Hydrolysis of 1, 4-beta-D-glucosidic linkages 
CC in cellulose and cellotetraose, releasing cellobiose from the non- 

CC reducing ends of the chains. 

CC -!- SIMILARITY: Belongs to the glycosyl hydrolase 7 (cellulase C) 

CC family. 

CC -!- SIMILARITY: Contains 1 CBMl (fungal-type carbohydrate -binding) 
CC domain. 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; X17258; CAA35159.1; -; Genomic_DNA. 



1 

DR PIR; S38794; S38794. 

DR HSSP; Q09431; IGPI . 

DR InterPro; IPR000254; CBD_fun. 

DR InterPro; IPR001722; Glyco_hydro_7 . 

DR Pfam; PF00734; CBM_1; 1. 

DR Pfam; PF00840; Glyco_hydro_7 ; 1. 

DR PRINTS; PR00734; GLHYDRLASE7 . 

DR ProDom; PD001821; CBD_fungal; 1. 

DR ProDom; PD186135; Glyco_hydro_7 ; 1. 

DR SMART; SM00236; fCBD; 1. 

DR PROSITE; PS00562; CBM1_1 ; 1. 

DR PROSITE; PS51164; CBM1_2 ; 1. 

KW Carbohydrate metabolism; Cellulose degradation; Glycoprotein; 

KW Glycosidase; Hydrolase; Polysaccharide degradation; Signal. 



FT 


SIGNAL 


1 


18 


Potential . 


FT 


CHAIN 


19 


525 


Exoglucanase 1. 


FT 








/FTId=PRO_0000007921. 


FT 


DOMAIN 


489 


525 


CBMl. 


FT 


REGION 


19 


467 


Catalytic. 


FT 


REGION 


468 


489 


Linker . 


FT 


ACT_SITE 


231 


231 


Nucleophile (By similarity) . 


FT 


ACT_SITE 


236 


236 


Proton donor (By similarity) . 


FT 


CARBOHYD 


289 


289 


N-linked (GlcNAc. . .) (Potential 


FT 


DISULFID 


497 


514 


By similarity. 


FT 


DISULFID 


508 


524 


By similarity. 


SQ 


SEQUENCE 


525 AA; 


55694 MW, 


? A6684D4CF881E090 CRC64 ; 


Query Match 




60.3%; 


Score 1652; DB 1; Length 525; 


Best Local Similarity 


57.3%; 


Pred. No. 1.5e-100; 


Matches 294; 


Conservative 76; Mismatches 121; Indels 22; 



Qy 


1 


Db 


19 


Qy 


61 


Db 


79 


Qy 


120 


Db 


139 


Qy 


180 


Db 


199 


Qy 


240 


Db 


259 


Qy 


298 


Db 


317 


Qy 


354 


Db 


377 


Qy 


414 


Db 


436 


Qy 


465 


Db 


493 



Gaps 7 ; 

lACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 
Ihl :| M hi lh:|l I h :hllllll : Mill II I ::: 



CPDNEXCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQK-NVGARLYLMASDTTYQE 119 

I I ::|hllhlll I IllhlhhIlh III: llhl III : II 

CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQHSTNVGSRTYLMDGEDKYQT 138 

FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 179 

I IIMIhllllll : IIIIMIIMIIIIIIhhIl I llllllllllhlllll 

FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 239 

Hlllhlhlll |:|: I I I :|:|||||||lllh:: I llllll HI II 
IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTI IGQSRCE 258 

GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS - - 297 

II lllllh II I llllll|:| II II :||| I hllllhllllll 
GDSCGGTYSNERYAGVCDPDGCDFNSYRQGNKTFYGKG- -MTVDTTKKITWTQFLKDAN 316 

GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 353 

I I hllhl : : II : hi :: || h llh I 

GDLGE I KRFYVQDGKI I PNSEST I PGVEGNS ITQDWCDRQKVAFGD IDDFNRKGGMKQMG 376 

KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 413 

II H lllllhllh :|||llllhl : : I I I I h I h II I I h I h : : II 
KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 464 

U lllhllillll I :||||| II llh Mill :| II 
SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 4 92 

HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 497 

MlllhMM I I II MUM 

RWQQCGGIGFTGPTQCEEPYICTKLNDWYSQCL 525 



Title: 



US-10-804-785-2 (Thr66 deleted) 



RESULT 170 
ABJ26900 

ID ABJ26900 Standard; protein; 460 AA. 
XX 

AC ABJ26900; 

XX 

DT 23-OCT-2003 (revised) 

DT 08-MAY-2003 (first entry) 

XX 

DE Cellobiohydrolase I activity protein SEQ ID No 52 . 
XX 

KW Cellobiohydrolase; enzyme; DNA shuffling; ethanol; biomass; 

KW cellobiohydrolase I; EC 3.2.1.91. 

XX 

OS Coprinopsis cinerea. 
XX 

PN WO2003000941-A2 . 
XX 

PD 03-JAN-2003. 

XX 

PF 26-JUN-2002; 2002WO-DK000429 . 
XX 

PR 26-JUN-2001; 2001DK-00001000 . 
XX 

PA (NOVO ) NOVOZYMES AS. 
XX 

PI Lange L, Wu W, Aubert D, Landvik S, Schnorr KM, Clausen IG; 

XX 

DR WPI; 2003-278244/27. 

DR N-PSDB; ABT23538. 

XX 

PT New polypeptide with cellobiohydrolase I activity, useful in producing 

PT ethanol from biomass. 

XX 

PS Claim 4; Page 175-177; 199pp; English. 
XX 

CC The invention relates to a novel polypeptide comprising: part of any of 

CC 21 amino acid sequences; an amino acid sequence at least 70% identical to 

CC a polypeptide encoded by a cellobiohydrolase gene; an amino acid sequence 

CC at least 80% identical to the polypeptide encoded by 21 nucleotide 

CC sequences; a polypeptide encoded by a nucleotide sequence which 

CC hybridises with a probe selected from complementary strands of 55 

CC nucleotide sequences; or a fragment of the aforementioned structures. The 

CC polynucleotides of the invention are useful in a method of DNA shuffling. 

CC The polypeptides are useful in a method for producing ethanol from 

CC biomass comprising contacting the biomass with the polypeptides. This 

CC sequence represents a protein with cellobiohydrolase I activity of the 

CC invention. (Updated on 23-OCT-2003 to standardise OS field) 

XX 

SQ Sequence 460 AA; 

Query Match 46.0%; Score 1259; DB 6; Length 460; 
Best Local Similarity 53.5%; Pred. No. 1.9e-73; 

Matches 234; Conservative 64; Mismatches 117; Indels 22; Gaps 9; 

Qy 8 SETHPPLTWQKCSSGGTC-TQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTLCPD-NE 65 

:| II I Ihh III I Ihllllll I h III! Ihhihl I 

Db 26 AENHPRLPWQRCTRNGGCQTVSNGQWLDANWRWLHVTDGYTNCYTGNSWNSTVCSDPTT 85 

Qy 66 CAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNE 125 

Ih I hll I llhlhh:hl hhi I llllhlll :: II III I 

Db 86 CAQRCALEGANYQQTYGITTNGDALTIKFLTRSQQTNVGARVYLMENENRYQMFNLLNKE 145 

Qy 126 FSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQ 185 

hlllllh:|lhlllllh lllllhll I I IIIIIMIIIMIIIIhllhl 



Db -146 FTFDVDVSKVPCGINGALYFIQMDADGGMSKQPNNRAGAKYGTGYCDSQCPRDIKFIDGV 205 



Qy 186 ANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTV---GQEICEGDGC 242 

II I II : I I I :| Ihlllllllllll I Mill I I : III I 

Db 206 ANSADWTPSETDPNAGRGRYGICCAEMDIWEANSISNAYTPHPCRTQNDGGYQRCEGRDC 265 

Qy 243 GGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFET SG 297 

: II I llllllhlhhil lllll hi! :|:|IIMI I :| 
Db 266 NQPRYEGLCDPDGCDYNPFRMGNKDFYGPGK--TVDTNRKMTVVTQFITHDNTDTG 319 

Qy -298 A---INRYYVQNGVTFQQPNAELGSY--SGNELNDDYCTAEEAEFGG-SSFSDKGGLTQF 351 

I I llhl I : : : : :|| :: II |||: i|| 

Db 320 TLVDIRRLYVQDGRVIANPPTNFPGLMPAHDSITEQFCTDQKNLFGDYSSFARDGGLAHM 379 

Qy 3 52 KKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSP 411 

: I II :h|:|: hllllll llh : II Ihl h II : I I 
Db 380 GRSLAKGHVLALSIWNDHGAHMLWLDSNYPTDADPNKPGIARGTCPTTGGTPRETEQNHP 439 

Qy 412 NAKVTFSNIKFGPIGST 428 

:hl lllllll MM 
Db 440 DAQVIFSNIKFGDIGST 456 



RESULT 171 




AAR94351 




ID 


AAR94351 standard; protein; 451 AA. 




XX 






AC 


AAR94351; 




XX 






DT 


29-AUG-1996 (first entry) 




XX 






DE 


Humicola insolens cellulase. 




XX 






KW 


Cellulase; detergents; textile axixiliaries; feed additives; 




KW 


digestive agents; host cell; recombinant production. 




XX 






OS 


Humicola insolens . 




XX 






FH 


Key Location/Qualif iers 




FT 


Peptide 1. .22 




FT 


/label= sig_jpeptide 




FT 


Peptide 23. .451 




FT 


/labels mat_peptide 




XX 






PN 


JP08056663-A. 




XX 






PD 


05-MAR-1996. 




XX 






PF 


29-AUG-1994; 94 JP-00203564 . 




XX 






PR 


29-AUG-1994; 94 JP-00203564 . 




XX 






PA 


(MEIJ ) MEIJI SEIKA KAISHA LTD. 




XX 






DR 


WPI; 1996-182296/19. 




DR 


N-PSDB; AAT13426. 




XX 






PT 


Humicola insolens cellulase - used as main component in detergents. 


PT 


textile auxiliaries, feed additives and digestive agents. 




XX 






PS 


Claim 2; Page 9-10; 16pp; Japanese. 




XX 






cc 


The present sequence is H. insolens cellulase, which is used 


as the main 


cc 


component in detergents, textile auxiliaries, feed additives 


and 


cc 


digestive agents. A host cell transformed with a vector contg. the 


cc 


cellulase DNA, can be used for the recombinant prodn. of the 


cellulase 


XX 





SQ Sequence 451 AA; 



Query Match 45.6%; Score 1250/ DB 2; Length 451; 

Best Local Similarity 52.1%; Pred. No. 7e-73; 

Matches 226; Conservative 84; Mismatches 114; Indels 10; Gaps 8; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I I h :| II Uh:|l I I I lllllllll I j: llhll hi 

Db 23 QQAGTITAENHPRMTWKRCSGPGNCQTVQGEWIDANWRWLH--NNGQNCYEGNKWTSQC 80 

Qy 61 CPDNECAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQ-KNVGARLYLMASDTTYQEF 119 

HI: I MM I MM :|||:||:: llh hhl lllh II I 

Db 81 SSATDCAQRCALDGANYQSTYGASTSGDSLTLKFVTKHEYGTNIGSRFYLMANQNKYQMF 140 

Qy 120 TLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDL 179 

Ih llhlllhh: Ihl IIHhh III:: Ihl llllllllllhll III 
Db 141 TLMNNEFAFDVDLSKVECGINSALYFVAMEEDGGMASYPSNRAGAKYGTGYCDAQCARDL 200 

Qy 180 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQ-EICE 238 

III hlhlll l|:|: I. hi hlhhhlhh : I III I : : Ml 
Db 201 KFIGGKANIEGWRPSTNDPNAGVGPMGACCAEIDVWESNAYAYAFTPHACGSKNRYHICE 260 

Qy 239 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 298 

: llllllhh I II :|lhlM|:|| III I Mil :| IM::|I : 
Db 261 TNNCGGTYSDDRFAGYCDANGCDYNPYRMGNKDF YGKGK- - TVDTNRKFTWSRFERN - R 3 1 7 

Qy 299 INRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKKATS 356 

:::::||:| : I | : :: : | h I : h: II =1 : 

Db 318 LSQFFVQDGRKIEVPPPTWPGLPNSADITPELCDAQFRVFDDRNRFAETGGFDALNEALT 377 

Qy 357 GGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVT 416 

lllll|:|l|:::||llllhll I : II II I hllllhlhl Hhl 
Db 378 IPMVLVMSIWDDHHSNMLWLDSSYPP-EKAGLPGGDRGPCPTTSGVPAEVEAQYPNAQW 436 

Qy 417 FSNIKFGPIGSTGN 430 

:|||:|IMIII I 
Db 437 WSNIRFGPIGSTVN 450 



RESULT 172 

7VAW44852 

ID AAW44852 standard; protein; 451 AA. 
XX 

AC AAW44852; 
XX 

DT 31-JUL-1998 (first entry) 
XX 

DE Humicola insolens cellulase NCEl protein. 

XX 

KW Humicola insolens; NCEl; NCE2; NCE4; cellulase; expression vector; 
KW promoter; signal sequence; terminator; amylase; lipase; protease; 
KW phytase . 

XX 

OS Humicola insolens. 
XX 

FH Key Location/Qualifiers 

FT Peptide 1. .21 

FT /label= signal 

FT Protein 22. .451 

FT /labels Cellulase_NCEl 

XX 

PN WO9803667-A1. 
XX 

PD 29-JAN-1998. 
XX 

PF 24-JUL-1997; 97WO- JP002560 . 
XX 



PR 24'-JUL-1996; 96JP-00195070 . 
XX 

PA (MEIJ ) MEIJI SEIKA KAISHA LTD. 
XX 

PI Moriya T, Murashima K, Aoyagi K, Sumida N, Watanabe M, Hamaya T; 

PI Koga J, Kono T, Murakami T; 

XX 

DR WPI; 1998-120786/11. 

DR N-PSDB; AAV19376. 
XX 

PT Mass production of proteins and peptides in Humicola species - using 

PT expression vector containing the promoter, signal sequence and/or 

PT terminator from the Humicola insolens NCEl or NCE2 gene. 

XX 

PS Claim 8; Page 34-39; 63pp; Japanese. 
XX 

CC The present sequence represents the Humicola insolens cellulase NCEl 

CC protein from the present invention. The present invention describes a 

CC method for the mass production of proteins and peptides in Humicola 

CC species, especially in Humicola insolens, using an expression vector 

CC which comprises the promoter, signal sequence and/or terminator 

CC regulatory sequences from the NCEl or NCE2 gene of H. insolens. These are 

CC available in the plasmids pM3-l {Escherichia coli JM109/pM3-l, PERM BP- 

CC 5971) (for NCEl) and pM14-l (E. coli JM109/pM14-l, PERM BP-5972) (for 

CC NCE2) . The vector also contains a marker gene such as an antibiotic 

CC resistance gene (e.g. the destomycin resistance gene from Streptomyces 

CC rimofaciens) , Proteins which can be expressed using this system include 

CC cellulase, amylase, lipase, protease, phytase and other enzymes. Specific 

CC expression vectors of the invention are pMKDOl (for Humicola NCE3 

CC cellulase gene) , pEGDOl (for Humicola NCE4 cellulase gene) and pIED02 

CC (for Humicola NCE4 cellulase gene) . The expression system allows the 

CC efficient production of proteins and peptides in a Humicola host. Using 

CC the expression system high amounts of protein (>4.5 g/1) can be obtained 
XX 

SQ Sequence 451 AA; 

Query Match 45.6%; Score 1250; DB 2; Length 451; 

Best Local Similarity 52.1%; Pred. No. 7e-73; 

Matches 226; Conservative 84; Mismatches 114; Indels 10; Gaps 8; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I I h H II :|h:|l I I I IIIIIIMI I |: llhll hi 

Db 23 QQAGTITAENHPRMTWKRCSGPGNCQTVQGEWIDANWRWLH--NNGQNCYEGNKWTSQC 80 

Qy 61 CPDNECAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQ-KNVGARLYLMASDTTYQEF 119 

:|h I MM I MM Mlhlh: llh hhl MM: II I 

Db 81 SSATDCAQRCALDGANYQSTYGASTSGDSLTLKFVTKHEYGTNIGSRFYLMANQNKYQMF 140 

Qy 120 TLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDL 179 

Ih llhlllhl:: Ihl llllhh Ml:: ||:| jlMMIMIhll III 
Db 141 TLMNNEFAFDVDLSKVECGINSALYFVAMEEDGGMASYPSNRAGAKYGTGYCDAQCARDL 200 

Qy 180 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQ-EICE 238 

III hlhlll l|:|: I hi hlhhhihh : I Mil : : III 
Db 201 KFIGGKANIEGWRPSTNDPNAGVGPMGACCAEIDVWESNAYAYAFTPHACGSKNRYHICE 260 

Qy 239 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 298 

^ MIMIhh I II :|||:||l|:|| III I hll :| llh:|l : 
Db 261 TNNCGGTYSDDRFAGYCDANGCDYNPYRMGNKDFYGKGK--TVDTNRKFTWSRFERN-R 317 

Qy 299 INRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKKATS 356 

::-:|hl : I | : :: : | |: | : h: II :| : 

Db 318 LSQFFVQDGRKIEVPPPTWPGLPNSADITPELCDAQFRVFDDRNRFAETGGFDALNEALT 377 

Qy 357 GGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVT 416 

llllll=llh::|lillli:|| I : || || | hlMlhlhl |||:| 
Db 378 IPMVLVMSIWDDHHSNMLWLDSSYPP-EKAGLPGGDRGPCPTTSGVPAEVEAQYPNAQW 436 



Qy 417 FSNIKFGPIGSTGN 430 

:|l|:||||||| I 
Db 437 WSNIRFGPIGSTVN 450 



RESULT 173 
ABJ26887 

ID ABJ26887 Standard; protein; 451 AA. 
XX 

AC ABJ26887; 
XX 

DT 08-MAy-2003 (first entry) 
XX 

DE Cellobiohydrolase I activity protein SEQ ID No 6 . 
XX 

KW Cellobiohydrolase; enzyme; DNA shuffling; ethanol; biomass; 

KW cellobiohydrolase I; EC 3.2.1.91. 

XX 

OS Scytalidium sp. 
XX 

PN WO2003000941-A2. 
XX 

PD 03-JAN-2003. 
XX 

PF 26-JUN-2002; 2002WO-DK000429 . 
XX 

PR 26-JUN-2001; 2001DK-00001000 . 
XX 

PA (NOVO ) NOVOZYMES AS. 
XX 

PI Lange L, Wu W, Aubert D, Landvik S, Schnorr KM, Clausen IG; 

XX 

DR WPI; 2003-278244/27. 

DR N-PSDB; ABT23505. 
XX 

PT New polypeptide with cellobiohydrolase I activity, useful in producing 

PT ethanol from biomass. 

XX 

PS Claim 4; Page 119-121; 199pp; English. 
XX 

CC The invention relates to a novel polypeptide comprising: part of any of 

CC 21 amino acid sequences; an amino acid sec[uence at least 70% identical to 

CC a polypeptide encoded by a cellobiohydrolase gene; an amino acid sequence 

CC at least 80% identical to the polypeptide encoded by 21 nucleotide 

CC sequences; a polypeptide encoded by a nucleotide sequence which 

CC hybridises with a probe selected from complementary strands of 55 

CC nucleotide sequences; or a fragment of the aforementioned structures. The 

CC polynucleotides of the invention are useful in a method of DNA shuffling. 

CC The polypeptides are useful in a method for producing ethanol from 

CC biomass comprising contacting the biomass with the polypeptides. This 

CC sequence represents a protein with cellobiohydrolase I activity of the 

CC invention 

XX 

SQ Sequence 451 AA; 

Query Match 45.5%; Score 1245; DB 6; Length 451; 

Best Local Similarity 51.8%; Pred. No. 1.5e-72; 

Matches 225; Conservative 85; Mismatches 114; Indels 10; Gaps 8; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I I h :| II :|h:|l I I I lllllllll I h Mhll hi 

Db 23 QQAGTITAENHPRMTWKRCSGPGNCQTVQGEWIDANWRWLH--NNGQNCYEGNKWTSQC 80 

Qy 61 CPDNECAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQ-KNVGARLYLMASDTTYQEF 119 

:|h I INI I MM MMMh: Ml: hhl MM: II I 

Db 81 SSATDCAQRCALDGANYQSTYGASTSGDSLTLKFVTKHEYGTNIGSRFYLMANQNKYQMF 140 



Qy 

Db 



120 TLLGNEFSFDVDVSQLPCGLNGALYFVS^^DADGGVSKYPTNTAGAKYGTGYCDSQCPRDL 179 

Ih llhlllhh: ihl lll||:|: |||:: ||:| | I I I I I I I I | | : I I III 
141 TLMNNEFAFDVDLSKVECGINSALYFVAMEEDGGMASYPSNRAGAKYGTGYCDAQCARDL 200 



Qy 180 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQ-EICE 238 

III hlhlll Ihh I hi hlhhhihh : I III I : : III 
Db 201 KFIGGKANIEGWRPSTNDPNAGVGPMGACCAEIDVWESNAYAYAFTPHACGSKNRYHICE 260 

Qy 239 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGA 298 

: llllllhh I II :|lhlllhll III I hll :| llhHI : 
Db 261 TNNCGGTYSDDRFAGYCDANGCDYNPYRMGNKDFYGKGK- -TVDTNRKFTWSRFERN-R 317 

Qy 299 INRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKKATS 356 

:::::|hl : I | : :: : | |: | : |:: II U : 

Db 318 LSQFFVQDGRKIEVPPPTWPGLPNSADITPELCDAQFRVFDDRNRFAETGGFDALNEALT 377 

Qy 357 GGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVT 416 

lllllhllh::Mllllhll I : M III hllllhlhl hhl 
Db 378 I PMVLVMS I WDDHHSNMLWLDSS YPP -EKAGLPGGDRGPCPTTSGVPAEVEAQYPDAQW 436 

Qy 417 FSNIKFGPIGSTGN 430 

Hlhlllllll I 
Db 437 WSNIRFGPIGSTVN 450 



RESULT 9 

US-09-463-712C-10 

; Sequence 10, Application US/09463712C 
; Patent No. 6558937 
; GENERAL INFORMATION: 
; APPLICANT: DSM, N.V. 

APPLICANT: Gielkens, Marcus 
; APPL I CANT : Ve s s e r , Jacob 
; APPLICANT: De Graaff, Leendert 

; TITLE OF INVENTION: CELLULOSE DEGRADING ENZYMES OF 

; TITLE OF INVENTION: ASPERGILLUS 

; FILE REFERENCE: 24615-20135.00 

; CURRENT APPLICATION NUMBER: US/09/463 , 712C 

; CURRENT FILING DATE: 2000-04-04 

; PRIOR APPLICATION NUMBER: PCT/EP98/05047 

; PRIOR FILING DATE: 1998-07-31 

; NUMBER OF SEQ ID NOS : 14 

SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 10 

LENGTH: 536 
; TYPE: PRT 

; ORGANISM: Aspergillus niger 
US-09-463-712C-10 

Query Match 61.1%; Score 1673.5; DB 2; Length 536; 

Best Local Similarity 59.6%; Pred. No. 2.3e-122; 

Matches 308; Conservative 61; Mismatches 125; Indels 23; Gaps 7; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I I Hill MM hi hll I lllllllll hhhllM II I 
Db 22 QQVGTYTTETHPSLTWQTCTSDGSCTTNDGEWIDANWRWVHSTSSATNCYTGNEWDTSI 81 

Qy 61 CPDN-ECAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEF 119 

I h II II MM I MIIIIMh I : MM : IhhIMIh h h I 

Db 82 CTDDVTCAANCALDGATYEATYGVTTSGSELRLNFVTQGSSKNIGSRLYLMSDDSNYELF 141 

Qy 120 TLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDL 179 

Ml Ihllllll IMIIMIIIIhllMM hi I IIIMIIIIIIIIIIIII 

Db 142 KLLGQEFTFDVDVSNLPCGLNGALYFVAMDADGGTSEYSGNKAGAKYGTGYCDSQCPRDL 201 

Qy 180 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEG 239 



IliilHI HIIIIIII llhl llllhllhllillli I I III :| I :|:| 
Db 202 KFINGEANCDGWEPSSNNVNTGVGDHGSCCAEMDVWEANSISNAFTAHPCDSVSQTMCDG 261 



Qy 



240 DGCGGTy--SDNRYGGTCDPDGCDWNPyRLGNTSFYGPGSSFTLDTTKKLTVVTQFET-- 295 



Db 




Qy 



29^-''--SGA INRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSFSDK-GGL 348 




Db 



Qy 



349 



TQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVES 408 



Db 



380 




Qy 



409 




Db 



440 



Qy 



460 



PTQSHYGQCGGIGYSGPTVCASGTTCQVLNP YYSQCL 496 



Db 



500 



S AAQAYGQCGGQGWTGPTTCVSG YTCTYEDAYYSQCL 5 3 




RESULT 10 
US-08-676-166A-3 

; Sequence 3, Application US/08676166A 
; Patent No. 5955270 
; GENERAL INFORMATION: 

APPLICANT: Radford, Alan 
APPLICANT: Parish, John H. 
; TITLE OF INVENTION: EXPLOITATION OF THE CELLULASE COMPLEX OF 
TITLE OF INVENTION: NEUROSPORA 
NUMBER OF SEQUENCES: 7 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: David A. Jackson, Esq. 
; STREET: 411 Hackensack Ave, Continental Plaza, 4th 

; STREET : Floor 

; CITY: Hackensack 

; STATE: New Jersey 

COUNTRY : USA 
ZIP: 07601 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC -DOS/MS -DOS 
SOFTWARE: Patent In Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/676 , 166A 
FILING DATE: 15-JUL-1996 
CLASSIFICATION: 435 
ATTORNEY/ AGENT INFORMATION: 
; NAME: Jackson Esq., David A. 

; REGISTRATION NUMBER: 2 6,742 

REFERENCE/DOCKET NUMBER: 1321-1-002 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 201-487-5800 
TELEFAX: 201-343-1684 
; INFORMATION FOR SEQ ID NO: 3: 

SEQUENCE CHARACTERISTICS: 
; LENGTH: 525 amino acids 

; TYPE: amino acid 

STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
HYPOTHETICAL : NO 
0RIGIN7VL SOURCE: 



ORGANISM: H. grisea 
US-08-676-166A-3 



Query Match 60.1%; Score 1647.5; DB 1; Length 525; 

Best Local Similarity 57.5%; Pred. No. 2.4e-120; 

Matches 295; Conservative 75; Mismatches 120; Indels 23; Gaps 8; 

Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl :| II hhlh:|i I h :|:|||||l ^ Mill II I 

Db 19 QQACSLTTERHPSLSWKKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 



Qy 61 CPD-NErCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSA-QKNVGARLYLMASDTTYQE 118 

I I Ihllhlll I lllhlhhllh III: Ilhl III : II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQYSTNVGSRTYLMDGEDKYQT 138 

Qy 119 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 178 

I lllllhllllll : llllllllllllllllhhil I llllllllllhlllll 

Db 139 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

( 

Qy 179 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 238 

Hlllhlhlll hh I I I :|:||||||llllh:: I llllll HI II 
Db 199 IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 239 GDicGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETS-T 296 

II lllllh II I llllllhl II II :|ll I hllllhllllll 

Db 259 GDSCGGTYSNERYAGVCDPDGCDFNS YRQGNKTFYGKG - - MTVDTTKKITWTQFLKDAN 316 

Qy 297 GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 352 

I I hllhl : : m hi II h llh I 

Db 317 GDLGEIKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 

Qy 353 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 412 

II :| lllllhllh Ullllllhl : : III Ihl hllllhlh^HI 
Db 3 77 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

Qy 413 AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 463 

= I lllhlllllll I :||||| II llh Mill M II 

Db 436 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 492 

Qy 464 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 496 

: lllllh:|ll I M II :||lll 

Db 4 93 RWQQCGGIGFTGPTQCEEPYTCTKLNDWYSQCL 525 



RESULT 19 
US-09-329-350-35 

Sequence 35, Application US/09329350 
Patent No. 6184 019 
GENERAL INFORMATION: 

APPLICANT: Miettinen-Oinonen, Arja 
APPLICANT : Londesborough , John 
APPLICANT: Vehmaanper , Jari 
APPLICANT: Haakana, Heli 
APPLICANT: M ntyl , Arja 
APPLICANT: Lantto, Raija 
APPLICANT: Elovainio, Minna 
APPLICANT: Joutsjoki, Vesa 
APPLICANT: Paloheimo, Marja 
APPLICANT: Suominen, Pirkko 

TITLE OF INVENTION: NOVEL CELLULASES, THE GENES ENCODING THEM AND 
TITLE OF INVENTION: USES THEREOF 
NUMBER OF SEQUENCES: 45 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Sterne, Kessler, Goldstein & Fox P.L.L.C. 
STREET: 1100 New York Avenue, N.W., Suite 600 
CITY: Washington 
STATE: D.C. 



•COUNTRY: USA 
ZIP: 20005 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Diskette, 3.50 inch 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patent In Release #1.0, Version #1.30 (EPO) 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/329,350 
FILING DATE: Herewith 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/841,636 
FILING DATE: 3 0 -APR- 1997 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/005,335 
; FILING DATE: 17 -OCT- 1995 

PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/007,926 
; FILING DATE: 04 -DEC- 1995 

PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 60/020,840 
FILING DATE: 28-JUN-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/732,181 
FILING DATE: 16 -OCT- 1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/FI96/ 00550 
FILING DATE: 17 -OCT- 1996 
ATTORNEY/ AGENT INFORMATION: 
; NAME : Shea Jr . , Timothy 

; REGISTRATION NUMBER: 41,306 

REFERENCE/DOCKET NUMBER: 1716 . 0510006/MAC/TJS 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (2 02)3 71-2 600 
TELEFAX: (202)371-2540 
; INFORMATION FOR SEQ ID NO: 35: 

SEQUENCE CHARACTERISTICS: 
; LENGTH: 452 amino acids 

; TYPE: amino acid 

STRANDEDNESS : 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
ORIGINAL SOURCE: 
; ORGANISM: Melanocarpus albomyces 

STRAIN: ALK04237 
; FEATURE : 

NAME/KEY: Protein 
LOCATION: 1..452 

OTHER INFORMATION: /label= 50K-cellulase-B 
US-09-329-350-35 

Query Match 44.6%; Score 1221.5; DB 2; Length 452; 

Best Local Similarity 51.8%; Pred. No. 3.5e-87; 

Matches 220; Conservative 76; Mismatches 118; Indels 11; Gaps 9; 



Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTLCPDNECAK 68 

I llllllhh: I I IIIIIIIM I I mill |:: :||: 

Db 31 ENHPPLTWQRCTAPGNCQTVNAEWIDANWRWLHDDNMQ-NCYDGNQWTNACSTATDCAE 89 

Qy 69 NCCLDGAA-YASTYGVTTSGNSLSIGFVTQSAQ-KNVGARLYLMASDTTYQEFTLLGNEF 126 

I ^Hl I Ml Hlh:h: llh llhl III II I hill 

Db 90 KCMIEGAGDYLGTYGASTSGDALTLKFVTKHEYGTNVGSRFYLMNGPDKYQMFNLMGNEL 149 

Qy 127 SFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQA 186 

Hllhl : l|:| llllhh III:: Ihl MhllllMhll |N||: |:| 
Db 150 AFDVDLSTVECGINSALYFVAMEEDGGMASYPSNQAGARYGTGYCDAQCARDLKFVGGKA 209 



■ ■ . J 

Qy 187 NVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGGGGTY 246 

hllh h:: I hi :|||hhhlhh = I III III =11 Mill 

Db 210 NIEGWKSSTSDPNAGVGPYGSCCAEIDVWESNAYAFAFTPHACTTNEYHVCETTNCGGTY 269 

Qy 247 SDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSGAINRYYVQN 306 

h:h I II :| I hi III I I I I I : : I I I I : : I I :::|::|: 

Db 270 SEDRFAGKCDANGCDYNPYRMGNPDFYGKGK--TLDTSRKFTWSRFE-ENKLSQYFIQD 326 

Qy 307 G--VTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFKKATSGGMVLVM 363 

I : I II : :|: : |: I : I : M I I Mill 

Db 327 GRKIEIPPPTWE-GMPNSSEITPELCSTMFDVFNDRNRFEEVGGFEQLNNALRVPMVLVM 385 

Qy 364 SLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFG 423 

hllhlllllllll II I III II I I llllhlhl hhl :|lhll 

Db 386 SIWDDHYANMLWLDSIYPP-EKEGQPGAARGDCPTDSGVPAEVEAQFPDAQWWSNIRFG 444 

Qy 424 PIGST 42 8 

Mill 

Db 445 PIGST 449 



RESULT 33 
US-08-709-974A-11 

Sequence 11, Application US/08709974A 
Patent No. 6117664 
GENERAL INFORMATION: 

APPLICANT: Sch lein, Martin 
APPLICANT: Rosholm, Peter 
APPLICANT: Nielsen, Jack Bech 
APPLICANT: Hansen, Svend Aage 
APPLICANT: von der Osten,Claus 

TITLE OF INVENTION: No. 6117664el Alkaline Cellulases 
NXJMBER OF SEQUENCES: 11 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 61176640 No. 6117664disk of No. 6117664th America, Inc. 
STREET: 405 Lexington Avenue, 64th Floor 
CITY: New York 
STATE: New York 

COUNTRY: United States of America 
ZIP: 10174-6401 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentin Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/709 , 974A 
FILING DATE: 09-SEP-1996 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION : 
NAME: Gregg, Valeta 
REGISTRATION NUMBER: 35,127y 
REFERENCE/DOCKET NUMBER: 4160.414-US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-867-0123 
TELEFAX : 2 12 - 8 7 8 - 9655 
INFORMATION FOR SEQ ID NO: 11: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 456 amino acids 
TYPE: amino acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-709-974A-11 



Query Match 



27.4%; Score 751; DB 2; Length 456; 



.Best -Local Similarity 36.3%; Pred. No. 1.9e-50; 

Matches 172; Conservative 71; Mismatches 161; Indels 70; Gaps 17; 

Qy 9 ETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYD-GNTWSSTLCPDNE-C 66 

I il H :h I ::| :|:|| I :: II I I : I III || 

Db 28 EVHPQITTYRCTKADGCEEKTNYIVLDALSHPVHQVDNPYNCGDWGQKPNETACPDLESC 87 

Qy 67 AKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKN-VGARLYLMASDTT---YQEFTLL 122 

hll H : :||:| I || : | | I hlh I I h I 

Db 88 ARNCIMDPVSDYGRHGVSTDGTSLRL KQLVGGNWSPRVYLL--DETKERYEMLKLT 142 

Qy 123 GNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQC PRDL 179 

lllhllM ::|llhl III III I |: III :||||||:|| I 
Db 143 GNEFTFDVDATKLPCGMNSALYLSEMDATGARSE--LNPGGATFGTGYCDAQCYVTP 197 

I 

Qy 180 KFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEG 239 

MM hi I hlhlMIMIh : lllh I MM 

Db 198 -FINGLGNIE GKGACCNEMDIWEANARAQHIAPHPCSKAGPYLCEG 242 

Qy 240 DGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETSG-- 297 

I MM Ml lllll: I Ml |: I MM: :||||| | 

Db 243 AEC EFDGVCDKNGCAWNPYRVNVTDYYGEGAEFRVDTTRPFSWTQFRAGGDA 295 

Qy 298 AINRYYVQNGVTFQQPNAEL-GSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQ 350 

M I MIM : : I : : MM I I : I- I : 

Db 296 GGGKLESIYRLFVQDGRVIESYWDKPGLPPTDRMTDEFCAAT GAARFTELGAMEA 351 

Qy 351 FKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQS 410 

I : MM MM M MM II I h : 

Db 352 MGDALTRGMVLALSIWWSEGNDMNWLDS-- --GEAGPCDPDEGNPSNIIRVQ 399 

Qy 411 PNAKVTFSNIKFGPIGSTGNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQSH 464 

h M Mh:M MM I : I I I : II I I M: 
Db 400 PDPEWFSNLRWGEIGST-YESAVDGPVGKGKGKGKGKAPA GDGNGKEKSN 449 



RESULT 4 
S38794 

cellulose 1, 4-beta-cellobiosidase (EC 3.2.1.91) - imperfect fungus (Humicola grisea) 
N; Alternate names: beta-glucancellobiohydrolase; exoglucanase 
C; Species: Humicola grisea var. thermoidea 

C;Date: lO-Sep-1999 #sequence_revision lO-Sep-1999 #text_change 09-Jul-2004 
C;Accession: S38794; S08240; A45869 
R; Radford, A. 

submitted to the EMBL Data Library, June 1991 
A; Reference number: S3 8794 
A; Accession: S3 8794 
A; Molecule type: DNA 

A;Residues: 1-525 <RAD> 

A; Cross-references: UNIPROT : P1582 8 ; UNIPARC :UPI000012BE0F; EMBL:X17258; NID:g2760; 
PIDN:CAA35159.1; PID:g2761 

A;Note: this is a revision to the sequence from reference S08240 
R;de Oliviera Azevedo, M.; Radford, A. 

Nucleic Acids Res. 18, 668, 1990 

A; Title: Sequence of cbh-1 gene of Humicola grisea var. thermoidea. 

A;Reference number: S08240; MUID : 90175006 ; PMID:2308855 

A; Accession: S08240 

A; Molecule type: DNA 

A;Residues: 1-299, 'H* , 301-525 <DEO> 

A; Cross-references: UNIPARC :UPI00001729F6; EMBL:X17258 
A;Note: the authors translated the codon CAG for residue 87 as His 
A;Note: this sequence has been revised in reference S38794 
R;Azevedo, M.; de, O.; Felipe, M.S.S.; Astolf i-Filho, S.; Radford, A. 
J. Gen. Microbiol. 136, 2569-2576, 1990 

A; Title: Cloning, sequencing and homologies of the cbh-1 (exoglucanase) gene of Humicola grisea 
var. thermoidea. 

A;Reference number: A45869; MUID : 91178527 ; PMID:2127803 



AccfeS'sion: A45869 
A; Status: not compared with conceptual translation 
A; Molecule type: DNA 

A; Residues: 1-20, 'R' , 22-34, 'K' , 36-86, 'H' , 88-141, •VM43- 157, 'Y' , 159-237, 'QQH' ,241-244, »!' ,246- 
299, 'H' , 301-525 <AZE> 

A; Cross-references: UNIPARC:UPI00001729F7; GB:M64588; GB:X17258 
A;Note: this sequence has been revised. See entry S08240 
C;Genetics : 
A; Gene: cbh-1 
A;Introns: 138/1 

C;Superf amily : cellulose 1, 4-beta-cellobiosidase I; fungal cellulose-binding domain homology 
C;Keywords: glycosidase; hydrolase; polysaccharide degradation 
F; 494 -52 5 /Domain: fungal cellulose-binding domain homology <FCB> 

Query Match 59.9%; Score 1641.5; DB 1; Length 52 5; 

Best Local Similarity 57.3%; Pred. No. 1.2e-91; 

Matches 294; Conservative 74; Mismatches 122; Indels 23; Gaps 8; 



Qy 1 QSACTLQSETHPPLTWQKCSSGGTCTQQTGSWIDANWRWTHATNSSTNCYDGNTWSSTL 60 

I Ihl :| II hi lh:|l I h :hllllll : INN II I 

Db 19 QQACSLTTERHPSLSWNKCTAGGQCQTVQASITLDSNWRWTHQVSGSTNCYTGNKWDTSI 78 

Qy 61 CPD-NECAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQK-NVGARLYLMASDTTYQE 118 

I I Ihllhlll I IlihIhhIII: llh llhl III : II 

Db 79 CTDAKSCAQNCCVDGADYTSTYGITTNGDSLSLKFVTKGQHSTNVGSRTYLMDGEDKYQT 138 

Qy 119 FTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRD 178 

I IMIIhllllll : llllllllllllllllhhil I llllllllllhlllll 

Db 139 FELLGNEFTFDVDVSNIGCGLNGALYFVSMDADGGLSRYPGNKAGAKYGTGYCDAQCPRD 198 

Qy 179 LKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE 238 

Hlllhlhlll hh I I I :|:|||||||lll|::: I Mill! HI II 
Db 199 IKFINGEANIEGWTGSTNDPNAGAGRYGTCCSEMDIWEANNMATAFTPHPCTIIGQSRCE 258 

Qy 239 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTWTQFETS-- 296 

II lllllh II I llllllhl II II :||| I hIMIhllllll 

Db 259 GDSCGGTYSNERYAGVCDPDGCDFNSYRQGNKTFYGKG--MTVDTTKKITVVTQFLKDAN 316 

Qy 297 GAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGG-SSFSDKGGLTQFK 352 

I I hllhl : : II : hi II h llh I 

Db 317 GDLGEIKRFYVQDGKIIPNSESTIPGVEGNSITQDWCDRQKVAFGDIDDFNRKGGMKQMG 376 

Qy 353 KATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPN 412 

II H lllllhllh :||lllllhl : : III Ihl hllllhlh::|| 
Db 377 KALAGPMVLVMSIWDDHASNMLWLDSTFPV-DAAGKPGAERGACPTTSGVPAEVEAEAPN 435 

Qy 413 AKVTFSNIKFGPIGST GNPSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQS 463 

: I lllhlllllll I :|IMI M llh Mill :| II 
Db 436 SNWFSNIRFGPIGSTVAGLPGAGNGGNNGGNPP PPTTTTSSAPATTTTASAGPKAG 492 

Qy 464 HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 496 

: MlllhHIl I I II :||||| 

Db 4 93 RWQQCGGIGFTGPTQCEEPYICTKLNDWYSQCL 525 



